I believe the OpenCL equivalent of the CUDA block is work group. in CUDA we have to explicitly define the block size and i just heard from a lecture that in OpenCL we do not need to define the work group size and the most optimum is decided by the OpenCL itself. Is that really true ?
I believe that OpenCL still provides with the provision to define it .
How about some more discussion over this issue?
Yes, that's roughly right. If you don't specify it yourself you don't know what the runtime will select and you may not know how much local memory to allocate for each work group.
On the other hand, 64 is almost always the right answer on AMD hardware in my experience. When you create a workgroup equal to the size of the hardware thread (wavefront) you remove synchronization overhead and gain performance overall.