cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

shantanu
Journeyman III

clCreateBuffer call and Virtual Memory Size of the process

clCreateBuffer call and Virtual Memory Size of the process

Thanks for reading the post.

I have experienced that my process virtual memory (VM) size increases in sync with the number of times i call clCreateBuffer ... and decreases once I delete the clMem buffers. In my understanding, I was hoping that the VM size of the process should not be affected by how many times I create GPU buffer ...

Attached is the partial sample code for illustration.

I am running the simple process on Mac platform, and for my case X amount of GPU buffer created results in X amount of increase in my process Virtual Memory size.  I understand this is not Mac forum, but please help me with the general question -
"In OpenCL, creating GPU device buffer using clCreateBuffer with CL_MEM_READ_WRITE can affect the VM Size of the process itself?" 

for (int i = 0; i < NUM_CL_MEM_OBJS_CREATED; i++) { cl_mem clMemBuffer = CL_CHECK_ERR(clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float) * NUM_DATA, NULL, &_err)); CL_CHECK(clEnqueueWriteBuffer(queue, clMemBuffer, CL_TRUE, 0, sizeof(float) * NUM_DATA, static_cast<void*>(&dataInput[0]), 0, NULL, NULL)); CL_CHECK(clFinish(queue)); totalBytesGPUMem += sizeof(float) * NUM_DATA; clMemVec = clMemBuffer; printf("Total GPU Buffer Allocated so far %ld bytes (%f MB) ||| Virtual Memory of Process %d MB\n", totalBytesGPUMem, totalBytesGPUMem / (1024.0 * 1024.0), GetProcessVirtualMemoryUsageInMB()); usleep(1000000); } printf("Now deleting GPU Memor allocated\n"); for (int i = 0; i < NUM_CL_MEM_OBJS_CREATED; i++) { CL_CHECK(clReleaseMemObject(clMemVec)); printf("Virtual Memory of Process %d MB\n", GetProcessVirtualMemoryUsageInMB()); }

0 Likes
10 Replies
genaganna
Journeyman III

Originally posted by: shantanu Thanks for reading the post.

I have experienced that my process virtual memory (VM) size increases in sync with the number of times i call clCreateBuffer ... and decreases once I delete the clMem buffers. In my understanding, I was hoping that the VM size of the process should not be affected by how many times I create GPU buffer ...

Attached is the partial sample code for illustration. I am running the simple process on Mac platform, and for my case X amount of GPU buffer created results in X amount of increase in my process Virtual Memory size.  I understand this is not Mac forum, but please help me with the general question - "In OpenCL, creating GPU device buffer using clCreateBuffer with CL_MEM_READ_WRITE can affect the VM Size of the process itself?" 

AMD is not responisble for Mac platform.  I have one question. did you create context with both CPU and GPU devices?

0 Likes

Thanks for your response. 

I pass CL_DEVICE_TYPE_GPU when I make a call to clGetDeviceIDs ... meaning Im only searching for GPUs on the OpenCL platform.

And then I create a context for one particular GPU device ...

May I ask what would be the behavior of my query on Windows platform?

Thanks for your time.

 

0 Likes

Originally posted by: shantanu Thanks for your response. 

I pass CL_DEVICE_TYPE_GPU when I make a call to clGetDeviceIDs ... meaning Im only searching for GPUs on the OpenCL platform. And then I create a context for one particular GPU device ... May I ask what would be the behavior of my query on Windows platform?

Thanks for your time.

 



You should get error CL_MEM_OBJECT_ALLOCATION_FAILURE from clCreateBuffer after some time.

If you don't use clEnqueueWriteBuffer, more buffers will be created.

0 Likes

Yes thats there. I understand that. 

But should the Virtual Memory Size of the process (making OpenCL clCreateBuffer calls) increase? Say on Windows platform? 

0 Likes

Originally posted by: shantanu Yes thats there. I understand that. 

But should the Virtual Memory Size of the process (making OpenCL clCreateBuffer calls) increase? Say on Windows platform? 

it won't increase.

0 Likes

Presumably the runtime will have to allocate staging buffers on the host. 

Remember, CL buffers are not allocated on a device. They are allocated in a context. Does your clCreateBuffer call at any point define the device you want to allocate it on? The only hint in that direction is when you enqueue the write operation.

0 Likes

Yes its correct that CL buffers are allocated in a context (clCreateBuffer call needs a context, and I'm creating the context for one particular GPU device). After creating the buffer im writing data to the buffer using clEnqueueWriteBuffer (I have attached sample code) and at that point the staging buffer on the host should be relinquished automatically?

I guess there is a broader question here that I'm trying to understand. Is it the case that the staging buffer on the host will continue to exist till the point that the clReleaseMemObject call is made - even though you actually write data to the created buffer? Or is the behavior vendor driver dependent.

Thanks

0 Likes

But you haven't moved the buffer to any device with enqueue write buffer. You just moved it in a queue associated with the device. In any case, if the buffer were relinquished on the host it would be at risk of having an out of memory error if you tried to move it back later. The behaviour of this is entirely runtime and driver dependent. In this case the opencl team here has set things up in the way they feel is most efficient.

0 Likes

actually ... as in the attached sample code ... i do a clFinish(queue) just after issuing the clEnqueueWriteBuffer() .. so that should actually do the moving of buffer to the device ...

but i understand that i cant predict or know for sure if the staging buffer on the host will be destroyed and under what circumstances ... so I guess I have to assume that host memory will be released only when I do an explicit clReleaseMemObject.

But then one last thing ... So if my GPU has 2GB of global memory size, then it means that my app on the host, that wants to make use of that 2GB of GPU global memory, needs to have extra virtual memory of that much equivalent on the host side ... if my app is 32 bit, then my virtual memory is limited to 4GB on the host side ... so whats the solution here? move to 64 bit app on the host?

Thanks

0 Likes

No, you're missing my point. clEnqueueWriteBuffer does NOT enqueue a write to the device. It enqueues a move from your array on the host into the runtime. When it is complete all you have a guarantee of is that the runtime has a copy of your data. You have no idea where it is.

It happens that, sensibly enough, our runtime will copy that to the GPU under most circumstances, but at times it won't make that move until you launch a kernel dependent on the buffer. It's implementation dependent and the runtime has to be able to move the data back to the host on demand if the driver needs the GPU memory for, say, rendering.

I would say you can't guarantee anything will be freed until you release the mem object.

 

0 Likes