I don't understand. Does it like LDS? How about its dimension?

Take for example matrix multiplication, why the block is 2D? And why the the function has global_x and global_y function to get thread ID? These are using *input, I think they should be 1D

There is no such thing as block in OpenCL, I assume you are referring to work-group. A work-group can be of 1, 2 or 3 dimension.

The 2D block size in matrix multiplication is because the algorithm requires it to be so.