1 Reply Latest reply on Sep 18, 2011 2:53 AM by genaganna

    Flags passing from GPU to host

    Raistmer
      what is the best buffer for this?

      Many of my kernels need to return some small (<1kB) vector of flags that determines if subsequent GPU memory transfer to host is needed or not.

      Do I understand right, that the best memory buffer for this vector would be
      pre-pinned memory allocated on host and accessed by host via map/unmap commands? Also, GPU should use that bufffer directly.
      i.e.

      1. buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) (but w/o CL_MEM_READ_ONLY flag)
      2. address = clMapBuffer( buffer )

      3. memset( address )
      4. clEnqueueUnmapMemObject( buffer )
      5. clEnqueueNDRangeKernel( buffer )
      6. address = clMapBuffer( buffer )
      7. read by CPU to check if flag==1
      8. goto 3.

      Also, maybe buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY) should be used instead of buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR ) ?

      What is better (provided only 1 launch from few hundreds will change flag from zero to 1) - to speedup GPU access with uncached memory usage or leave it cached to speedup subsequent checking by CPU?
        • Flags passing from GPU to host
          genaganna

           

          Originally posted by: Raistmer Many of my kernels need to return some small (<1kB) vector of flags that determines if subsequent GPU memory transfer to host is needed or not. Do I understand right, that the best memory buffer for this vector would be pre-pinned memory allocated on host and accessed by host via map/unmap commands? Also, GPU should use that bufffer directly. i.e. 1. buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) (but w/o CL_MEM_READ_ONLY flag) 2. address = clMapBuffer( buffer ) 3. memset( address ) 4. clEnqueueUnmapMemObject( buffer ) 5. clEnqueueNDRangeKernel( buffer ) 6. address = clMapBuffer( buffer ) 7. read by CPU to check if flag==1 8. goto 3. Also, maybe buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY) should be used instead of buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR ) ? What is better (provided only 1 launch from few hundreds will change flag from zero to 1) - to speedup GPU access with uncached memory usage or leave it cached to speedup subsequent checking by CPU?


          Your buffer type is correct for this situation.  One more thing : you should use CL_MEM_WRITE_ONLY as kernel writes into this buffer.