cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

andyste1
Adept II

How to use pinned memory for reading from GPU?

I'm struggling to find examples of using pinned memory, especially when it comes to reading data from the GPU.

Assuming my kernel has a 'int*' argument (containing the "results" to be read back by the host), would the steps involved be something like the following?

// Create device buffer and pass to kernel

results_buf = clCreateBuffer(ctx, CL_MEM_WRITE_ONLY, ...)

clSetKernelArg(kernel, ..., &results_buf)

// Create pinned host memory and map

pinned_buf = clCreatedBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, ...)

mapped_buf = (cl_int *)clEnqueueMapBuffer(queue, pinned_buf, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, ...);

// Run kernel

clEnqueueNDRangeKernel(...)

// Read results

clEnqueueReadBuffer(queue, results_buf, ..., (void *)&mapped_buf[0], ...);

Am I on the right lines here? What about clEnqueueUnmapMemObject() - do I need to use this at some point?

I will want to repeatedly run this kernel (and read the buffer), so are there any considerations there, e.g. will I have to call clEnqueueMappBuffer() each time?

0 Likes
1 Reply
dipak
Big Boss

Below are couple of usage scenarios and corresponding call sequences. Hope it will help you.

Typical call sequences using clEnqueueReadBuffer:

// called once

deviceBuffer = clCreateBuffer ( )

pinnedBuffer = clCreateBuffer ( CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR )

pinnedMemory = clEnqueueMapBuffer (pinnedBuffer, CL_MAP_WRITE )

// called multiple times

clEnqueueNDRangeKernel (deviceBuffer )

clEnqueueReadBuffer (deviceBuffer, pinnedMemory) // limited by PCI-e bandwidth

Application uses pinnedMemory directly

// called once

clEnqueueUnmapMemObject (pinnedBuffer, pinnedMemory)

Typical call sequences using clEnqueueCopyBuffer :

// called once

deviceBuffer = clCreateBuffer ( )

pinnedBuffer = clCreateBuffer ( CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR )

// called multiple times

clEnqueueNDRangeKernel (deviceBuffer )

clEnqueueCopyBuffer ( deviceBuffer, pinnedBuffer ) // limited by PCI-e bandwidth

pinnedMemory = clEnqueueMapBuffer ( pinnedBuffer, CL_MAP_READ ) // almost no op as already pinned memory

Application uses pinnedMemory directly

clEnqueueUnmapMemObject ( pinnedBuffer, pinnedMemory ) // no op as mapped for reading only

Thanks.

0 Likes