I don't understand. Does it like LDS? How about its dimension?
Take for example matrix multiplication, why the block is 2D? And why the the function has global_x and global_y function to get thread ID? These are using *input, I think they should be 1D
There is no such thing as block in OpenCL, I assume you are referring to work-group. A work-group can be of 1, 2 or 3 dimension.
The 2D block size in matrix multiplication is because the algorithm requires it to be so.