OpenCL

andyste1 · ‎11-14-2018

I'm struggling to find examples of using pinned memory, especially when it comes to reading data from the GPU.

Assuming my kernel has a 'int*' argument (containing the "results" to be read back by the host), would the steps involved be something like the following?

// Create device buffer and pass to kernel

results_buf = clCreateBuffer(ctx, CL_MEM_WRITE_ONLY, ...)

clSetKernelArg(kernel, ..., &results_buf)

// Create pinned host memory and map

pinned_buf = clCreatedBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, ...)

mapped_buf = (cl_int *)clEnqueueMapBuffer(queue, pinned_buf, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, ...);

// Run kernel

clEnqueueNDRangeKernel(...)

// Read results

clEnqueueReadBuffer(queue, results_buf, ..., (void *)&mapped_buf[0], ...);

Am I on the right lines here? What about clEnqueueUnmapMemObject() - do I need to use this at some point?

I will want to repeatedly run this kernel (and read the buffer), so are there any considerations there, e.g. will I have to call clEnqueueMappBuffer() each time?

dipak · ‎11-15-2018

Below are couple of usage scenarios and corresponding call sequences. Hope it will help you.

Typical call sequences using clEnqueueReadBuffer:
// called once
deviceBuffer = clCreateBuffer ( )
pinnedBuffer = clCreateBuffer ( CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR )
pinnedMemory = clEnqueueMapBuffer (pinnedBuffer, CL_MAP_WRITE )
// called multiple times
clEnqueueNDRangeKernel (deviceBuffer )
clEnqueueReadBuffer (deviceBuffer, pinnedMemory) // limited by PCI-e bandwidth
Application uses pinnedMemory directly
// called once
clEnqueueUnmapMemObject (pinnedBuffer, pinnedMemory)

Typical call sequences using clEnqueueCopyBuffer :
// called once
deviceBuffer = clCreateBuffer ( )
pinnedBuffer = clCreateBuffer ( CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR )
// called multiple times
clEnqueueNDRangeKernel (deviceBuffer )
clEnqueueCopyBuffer ( deviceBuffer, pinnedBuffer ) // limited by PCI-e bandwidth
pinnedMemory = clEnqueueMapBuffer ( pinnedBuffer, CL_MAP_READ ) // almost no op as already pinned memory
Application uses pinnedMemory directly
clEnqueueUnmapMemObject ( pinnedBuffer, pinnedMemory ) // no op as mapped for reading only

Thanks.

OpenCL

How to use pinned memory for reading from GPU?