3 Replies Latest reply on Sep 24, 2011 5:30 AM by notzed

    how to do optimized memcpy in kernel for opencl on CPU?

    zhuzxy

      For GPGPU, we can use multip work items do copy, but for CPU, as work item number may be very small, what's the best practise for memcpy? e.g copy 17 line and each line with 17 char datas ,what's the best practise in theory? copy the bytes one by one?

        • how to do optimized memcpy in kernel for opencl on CPU?
          LeeHowes

          How would you do it normally on the CPU? I would assume an unrolled SSE loop per core. So do that in OpenCL too, but for your convenience you can use the vector types instead of SSE intrinsics.

          It depends on what you're trying to achieve, though. If you're just doing a massive memcpy you might be better off creating a native kernel that calls an external library with a parallel memcpy routine.

          • how to do optimized memcpy in kernel for opencl on CPU?
            twentz

            I've never benchmarked it, but there's an OpenCL kernel function called "async_work_group_copy", and I'm going to take a guess at that that function optimizes memory movement (although it would also require synchronization, depending on your program)

              • how to do optimized memcpy in kernel for opencl on CPU?
                notzed

                 

                Originally posted by: twentz I've never benchmarked it, but there's an OpenCL kernel function called "async_work_group_copy", and I'm going to take a guess at that that function optimizes memory movement (although it would also require synchronization, depending on your program)

                 



                I would guess that async_work_group_copy is really just a way to access the asynchronous DMA transfer system on a CELL BE: it pretty much maps 1:1 to a simplified view of the hardware interface (and without it, CELL BE is pretty much knee-capped).  I imagine every other implementation is just there for completeness but might not necessarily be as optimised, or asynchronous.

                Although there's nothing to say it couldn't be.