Is there a reference which helps to optimize OpenCL kernels for AMD GPUs similarly? From a quick search it seems the wavefront is the AMD equivalent to the warp. Does it also allow for implicit synchronization?
You are right wavefront is equivalent of warp. wavefront value is different for different hardware like 16, 32 and 64. No way to get wavefront value from OpenCL as OpenCL does not say any implicit synchronization.
Is there an equivalent to the wavefront for the CPU device? I realize it has a substantially different architecture, but there is some value wrt emulating GPU execution for debugging purposes.
OpenCL considers CPU as compute device. so we should not consider this as emulation of GPU execution.
This seems more of a Khronos question, but does anyone know if this implicit synchronization capability is planned for the OpenCL spec at some point?
You will get appropriate answer if you post this at Khronos forum.