cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

andyste1
Adept I

Understanding buffer creation

Hi, I'm new to OpenCL and am trying to get a better understanding of how clCreateBuffer works. Does this command simply "reserve" an area of memory on the device, or does an implicit "copy" occur, e.g. to initialise the buffer? Regardless of what happens, is the created buffer guaranteed to be initialised with all zeros?

In my scenario I want to create a buffer that my kernel will write to. I don't need to copy anything to this buffer from the host, the only requirement is that the buffer is initialised with zeros before the kernel executes. What is the best/most perfomant way of doing so? (The host will need to read the contents of this buffer once the kernel has executed though).

Also, I will need to periodically initialise this buffer to zeros - do I use `clEnqeueFillBuffer`? I read elsewhere that it may be more performant to execute a kernel to do this?

0 Likes
1 Solution
dipak
Big Boss

OpenCL runtime uses deferred allocation by delaying buffer allocation until first use. So, if a initialize buffer is passed with CL_MEM_COPY_HOST_PTR, the runtime has to copy the data into a temporary runtime buffer. The memory is allocated on the device when the device first accesses the resource. At that time, any data that must be transferred to the resource is copied.

The buffer contents are not initialized at creation. If any initialization is required, application needs do it explicitly (for example, using CL_MEM_COPY_HOST_PTR).

In my scenario ....

A typical call sequence may be (assuming a dGPU):

  1. Create a zero-copy host-visible device buffer (with flag CL_MEM_USE_PERSISTENT_MEM_AMD ) [ there is a size limit, typically few MB]
  2. clEnqueueFillBuffer (or run a kernel to fill the device buffer with zero)
  3. Run the kernel
  4. clEnqueueReadBuffer

Please note, actual steps depend on the exact usage and also on underlying hardware (say APU or dGPU). It is recommended to do some experiments before choosing one.

I would suggest you to read the section 1.3 (OpenCL Memory Objects)  and section 1.4 (OpenCL Data Transfer Optimization) in AMD OpenCL Optimization guide that explain the memory allocation of buffer objects and various optimized paths for data transfer. It also describes various application scenarios, and the corresponding paths in the OpenCL API that are known to work well on AMD platforms.

Thanks.

View solution in original post

0 Likes
1 Reply
dipak
Big Boss

OpenCL runtime uses deferred allocation by delaying buffer allocation until first use. So, if a initialize buffer is passed with CL_MEM_COPY_HOST_PTR, the runtime has to copy the data into a temporary runtime buffer. The memory is allocated on the device when the device first accesses the resource. At that time, any data that must be transferred to the resource is copied.

The buffer contents are not initialized at creation. If any initialization is required, application needs do it explicitly (for example, using CL_MEM_COPY_HOST_PTR).

In my scenario ....

A typical call sequence may be (assuming a dGPU):

  1. Create a zero-copy host-visible device buffer (with flag CL_MEM_USE_PERSISTENT_MEM_AMD ) [ there is a size limit, typically few MB]
  2. clEnqueueFillBuffer (or run a kernel to fill the device buffer with zero)
  3. Run the kernel
  4. clEnqueueReadBuffer

Please note, actual steps depend on the exact usage and also on underlying hardware (say APU or dGPU). It is recommended to do some experiments before choosing one.

I would suggest you to read the section 1.3 (OpenCL Memory Objects)  and section 1.4 (OpenCL Data Transfer Optimization) in AMD OpenCL Optimization guide that explain the memory allocation of buffer objects and various optimized paths for data transfer. It also describes various application scenarios, and the corresponding paths in the OpenCL API that are known to work well on AMD platforms.

Thanks.

0 Likes