I assumed that the LDS memory access is 1-D, and I believe this was the case for the CAL & Brook LDS as well.
Why are all the examples such as MatirxMult for the OpenCL SDK samples are using 2D-arrays for passing the 2D dimension arguments of the local thread count, during kernel invocation using "clEnqueueNDRangeKernel".
Also, in most of my test, I have not seen a major imporvement in using 2-D memory in general, like what used to happen in Brook. It seems in the end I end up with the overhead of getting 1D offset = width * y + x in my kernel? Actually all samples are using 1D offsets. Do I need to bother of getting a 2-D data arrangement?