1 Reply Latest reply on Nov 15, 2018 5:05 AM by dipak

    How to use pinned memory for reading from GPU?

    andyste1

      I'm struggling to find examples of using pinned memory, especially when it comes to reading data from the GPU.

      Assuming my kernel has a 'int*' argument (containing the "results" to be read back by the host), would the steps involved be something like the following?

       

      // Create device buffer and pass to kernel

      results_buf = clCreateBuffer(ctx, CL_MEM_WRITE_ONLY, ...)

      clSetKernelArg(kernel, ..., &results_buf)

       

      // Create pinned host memory and map

      pinned_buf = clCreatedBuffer(ctx, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, ...)

      mapped_buf = (cl_int *)clEnqueueMapBuffer(queue, pinned_buf, CL_TRUE, CL_MAP_READ | CL_MAP_WRITE, 0, ...);

       

      // Run kernel

      clEnqueueNDRangeKernel(...)

       

      // Read results

      clEnqueueReadBuffer(queue, results_buf, ..., (void *)&mapped_buf[0], ...);

       

      Am I on the right lines here? What about clEnqueueUnmapMemObject() - do I need to use this at some point?

       

      I will want to repeatedly run this kernel (and read the buffer), so are there any considerations there, e.g. will I have to call clEnqueueMappBuffer() each time?

        • Re: How to use pinned memory for reading from GPU?
          dipak

          Below are couple of usage scenarios and corresponding call sequences. Hope it will help you.

           

          Typical call sequences using clEnqueueReadBuffer:

           

          // called once

          deviceBuffer = clCreateBuffer ( )

          pinnedBuffer = clCreateBuffer ( CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR )

          pinnedMemory = clEnqueueMapBuffer (pinnedBuffer, CL_MAP_WRITE )

           

          // called multiple times

          clEnqueueNDRangeKernel (deviceBuffer )

          clEnqueueReadBuffer (deviceBuffer, pinnedMemory) // limited by PCI-e bandwidth

          Application uses pinnedMemory directly

           

          // called once

          clEnqueueUnmapMemObject (pinnedBuffer, pinnedMemory)

          Typical call sequences using clEnqueueCopyBuffer :

           

          // called once

          deviceBuffer = clCreateBuffer ( )

          pinnedBuffer = clCreateBuffer ( CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR )

           

           

          // called multiple times

          clEnqueueNDRangeKernel (deviceBuffer )

          clEnqueueCopyBuffer ( deviceBuffer, pinnedBuffer ) // limited by PCI-e bandwidth

          pinnedMemory = clEnqueueMapBuffer ( pinnedBuffer, CL_MAP_READ ) // almost no op as already pinned memory

          Application uses pinnedMemory directly

          clEnqueueUnmapMemObject ( pinnedBuffer, pinnedMemory ) // no op as mapped for reading only

           

          Thanks.