1 Reply Latest reply on Nov 23, 2009 4:35 AM by genaganna

    implicit synchronization of memory access

    david_aiken

      Section 3.4 of the NVidia OpenCL Programming Guide v. 2.3 describes warp-level synchronization to avoid local memory barriers:

      Because a warp executes one common instruction at a time, threads within a warp
      are implicitly synchronized and this can be used to omit calls to the barrier()
      function for better performance.

      Is there a reference which helps to optimize OpenCL kernels for AMD GPUs similarly? From a quick search it seems the wavefront is the AMD equivalent to the warp. Does it also allow for implicit synchronization?

      Is there an equivalent to the wavefront for the CPU device? I realize it has a substantially different architecture, but there is some value wrt emulating GPU execution for debugging purposes.

      This seems more of a Khronos question, but does anyone know if this implicit synchronization capability is planned for the OpenCL spec at some point?

        • implicit synchronization of memory access
          genaganna

           

          Is there a reference which helps to optimize OpenCL kernels for AMD GPUs similarly? From a quick search it seems the wavefront is the AMD equivalent to the warp. Does it also allow for implicit synchronization?

          You are right wavefront is equivalent of warp. wavefront value is different for different hardware like 16, 32 and 64.  No way to get wavefront value from OpenCL as OpenCL does not say any implicit synchronization.

           

          Is there an equivalent to the wavefront for the CPU device? I realize it has a substantially different architecture, but there is some value wrt emulating GPU execution for debugging purposes.

          OpenCL considers CPU as compute device. so we should not consider this as emulation of GPU execution.

           

           

          This seems more of a Khronos question, but does anyone know if this implicit synchronization capability is planned for the OpenCL spec at some point?

           

          You will get appropriate answer if you post this at Khronos forum.