I'm struggling to find examples of using pinned memory, especially when it comes to reading data from the GPU.
Assuming my kernel has a 'int*' argument (containing the "results" to be read back by the host), would the steps involved be something like the following?
// Create device buffer and pass to kernel
results_buf = clCreateBuffer(ctx, CL_MEM_WRITE_ONLY, ...
)
clSetKernelArg(kernel, ..., &results_buf)
// Create pinned host memory and map
pinned_buf = clCreatedBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, ...)
mapped_buf = (cl_int *)clEnqueueMapBuffer(queue, pinned_buf, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, ...);
// Run kernel
clEnqueueNDRangeKernel(...)
// Read results
clEnqueueReadBuffer(queue, results_buf, ..., (void *)&mapped_buf[0], ...);
Am I on the right lines here? What about clEnqueueUnmapMemObject() - do I need to use this at some point?
I will want to repeatedly run this kernel (and read the buffer), so are there any considerations there, e.g. will I have to call clEnqueueMappBuffer() each time?