Archives Discussions

david_aiken · ‎11-22-2009

Section 3.4 of the NVidia OpenCL Programming Guide v. 2.3 describes warp-level synchronization to avoid local memory barriers:

Because a warp executes one common instruction at a time, threads within a warp
are implicitly synchronized and this can be used to omit calls to the barrier()
function for better performance.

Is there a reference which helps to optimize OpenCL kernels for AMD GPUs similarly? From a quick search it seems the wavefront is the AMD equivalent to the warp. Does it also allow for implicit synchronization?

Is there an equivalent to the wavefront for the CPU device? I realize it has a substantially different architecture, but there is some value wrt emulating GPU execution for debugging purposes.

This seems more of a Khronos question, but does anyone know if this implicit synchronization capability is planned for the OpenCL spec at some point?

genaganna · ‎11-23-2009

Is there a reference which helps to optimize OpenCL kernels for AMD GPUs similarly? From a quick search it seems the wavefront is the AMD equivalent to the warp. Does it also allow for implicit synchronization?

You are right wavefront is equivalent of warp. wavefront value is different for different hardware like 16, 32 and 64. No way to get wavefront value from OpenCL as OpenCL does not say any implicit synchronization.

Is there an equivalent to the wavefront for the CPU device? I realize it has a substantially different architecture, but there is some value wrt emulating GPU execution for debugging purposes.

OpenCL considers CPU as compute device. so we should not consider this as emulation of GPU execution.

This seems more of a Khronos question, but does anyone know if this implicit synchronization capability is planned for the OpenCL spec at some point?

You will get appropriate answer if you post this at Khronos forum.

Archives Discussions

implicit synchronization of memory access