14 Replies Latest reply on Feb 9, 2015 4:00 AM by binghy

    Host<->Device Memory Transfer during Kernel-Run?




      i try to transfer some data between Host and OpenCL Device during the Kernel Run,

      by use of "cl_mem_use_persistent_mem_amd" i figured out that i can create an Pointer to an Buffer,

      this works on my CPU but not on GPU (Programm crashes).


      Is it even possible to transfer Data to the GPU during the Kernel-Run?


      My System:

      Ubuntu Linux 12.04 64 bit, AMD X4 CPU, AMD HD 7750, Catalyst 12.6


      Thanks in advance,


        • Re: Host<->Device Memory Transfer during Kernel-Run?

          Here my steps


          1. Create an Buffer with clCreateBuffer with flag CL_MEM_READ_WRITE | CL_MEM_USE_PERSISTENT_MEM_AMD


          2. Get Pointer to Buffer with clEnqueueMapBuffer


          3. Set Value of Pointer to x


          4. Unmap Buffer with clEnqueueUnmapMemObject


          5. Set Kernel Args and run Kernel


          6. do some stuff on Host


          7. Set Value of Pointer to y


          8. wait for Kernel run to finish


          As i said,  step 7. works on CPU and not on GPU....




            • Re: Host<->Device Memory Transfer during Kernel-Run?

              Hi Srdja,

                  I think it's impossible to transfer data between cpu and gpu during the Kernel-Run at present.  And I think before step 7, the kernel has already finished.

              1 of 1 people found this helpful
              • Re: Host<->Device Memory Transfer during Kernel-Run?

                there is special sample in SDK which cover this use case. but it needs zero copy support which is on Linux supported only for Northern Isladns aka 7xxx GPU.

                • Re: Host<->Device Memory Transfer during Kernel-Run?

                  The reason mid-kernel CPU writes do not work is because CPU writes to persistant memory are not GPU cache coherent. That is, if the address happens to be cached in the GPU L1/L2, the cache lines will not be invalidated. It is safe to write to persistent memory only outside of kernel execution boundaries where the associated caches are flushed out and invalidated. This is also true for host memory zero copy buffers.

                  There is a way to make CPU writes visible to the GPU mid kernel execution, however it is not for everyone.

                  The GCN load/store ISA command (mubuf/mtbuf) have a bit called 'SLC'. SLC stands for System Level Coherent. If you flip it on the GPU will bypass its L1\L2 caches for this specific load\store operation forming coherency with the CPU.

                  You can patch your kernel ISA to flip the bit on.

                  The OCL compiler will flip it on in  single case - atomic operations the read back the previous value from memory. You can verify this behavior by observing the ISA when compiling kernels using CodeXL analysis mode.



                • Re: Host<->Device Memory Transfer during Kernel-Run?

                  You can't modify the data during the kernel execution, if that's what you're trying to achieve.


                  The API wont even allow you do to it:


                  e.g. you can only write to a mapped buffer when it's mapped, and you can't pass a mapped buffer to enqueuendrangekernel.


                  Even with zero-copy buffers there are things like caches to flush and synchronise and so forth, mmu tables, etc.


                  Your example is invalid api use as you cannot access Pointer after calling unmap.  unmap should be considered the same way free() is: once you've called it that pointer cannot be re-used.


                  So, you can transfer memory whilst another kenel is in process, but you cannot modify the memory that the executing one is using.  And this is a good thing.