Flags passing from GPU to host

Discussion created by Raistmer on Sep 17, 2011
Latest reply on Sep 18, 2011 by genaganna
what is the best buffer for this?

Many of my kernels need to return some small (<1kB) vector of flags that determines if subsequent GPU memory transfer to host is needed or not.

Do I understand right, that the best memory buffer for this vector would be
pre-pinned memory allocated on host and accessed by host via map/unmap commands? Also, GPU should use that bufffer directly.

1. buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR) (but w/o CL_MEM_READ_ONLY flag)
2. address = clMapBuffer( buffer )

3. memset( address )
4. clEnqueueUnmapMemObject( buffer )
5. clEnqueueNDRangeKernel( buffer )
6. address = clMapBuffer( buffer )
7. read by CPU to check if flag==1
8. goto 3.

Also, maybe buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY) should be used instead of buffer = clCreateBuffer(CL_MEM_ALLOC_HOST_PTR ) ?

What is better (provided only 1 launch from few hundreds will change flag from zero to 1) - to speedup GPU access with uncached memory usage or leave it cached to speedup subsequent checking by CPU?