Archives Discussions

Raistmer · ‎09-17-2011

what is the best buffer for this?

Many of my kernels need to return some small (<1kB) vector of flags that determines if subsequent GPU memory transfer to host is needed or not.

Do I understand right, that the best memory buffer for this vector would be
pre-pinned memory allocated on host and accessed by host via map/unmap commands? Also, GPU should use that bufffer directly.
i.e.

1. buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) (but w/o CL_MEM_READ_ONLY flag)
2. address = clMapBuffer( buffer )

3. memset( address )
4. clEnqueueUnmapMemObject( buffer )
5. clEnqueueNDRangeKernel( buffer )
6. address = clMapBuffer( buffer )
7. read by CPU to check if flag==1
8. goto 3.

Also, maybe buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY) should be used instead of buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR ) ?

What is better (provided only 1 launch from few hundreds will change flag from zero to 1) - to speedup GPU access with uncached memory usage or leave it cached to speedup subsequent checking by CPU?

genaganna · ‎09-18-2011

Originally posted by: Raistmer Many of my kernels need to return some small (<1kB) vector of flags that determines if subsequent GPU memory transfer to host is needed or not. Do I understand right, that the best memory buffer for this vector would be pre-pinned memory allocated on host and accessed by host via map/unmap commands? Also, GPU should use that bufffer directly. i.e. 1. buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) (but w/o CL_MEM_READ_ONLY flag) 2. address = clMapBuffer( buffer ) 3. memset( address ) 4. clEnqueueUnmapMemObject( buffer ) 5. clEnqueueNDRangeKernel( buffer ) 6. address = clMapBuffer( buffer ) 7. read by CPU to check if flag==1 8. goto 3. Also, maybe buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY) should be used instead of buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR ) ? What is better (provided only 1 launch from few hundreds will change flag from zero to 1) - to speedup GPU access with uncached memory usage or leave it cached to speedup subsequent checking by CPU?

Your buffer type is correct for this situation. One more thing : you should use CL_MEM_WRITE_ONLY as kernel writes into this buffer.

Archives Discussions

Flags passing from GPU to host