10 Replies Latest reply on Oct 14, 2011 3:05 PM by LeeHowes

    clCreateBuffer call and Virtual Memory Size of the process

    shantanu
      clCreateBuffer call and Virtual Memory Size of the process

      Thanks for reading the post.

      I have experienced that my process virtual memory (VM) size increases in sync with the number of times i call clCreateBuffer ... and decreases once I delete the clMem buffers. In my understanding, I was hoping that the VM size of the process should not be affected by how many times I create GPU buffer ...

      Attached is the partial sample code for illustration.

      I am running the simple process on Mac platform, and for my case X amount of GPU buffer created results in X amount of increase in my process Virtual Memory size.  I understand this is not Mac forum, but please help me with the general question -
      "In OpenCL, creating GPU device buffer using clCreateBuffer with CL_MEM_READ_WRITE can affect the VM Size of the process itself?" 

      for (int i = 0; i < NUM_CL_MEM_OBJS_CREATED; i++) { cl_mem clMemBuffer = CL_CHECK_ERR(clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(float) * NUM_DATA, NULL, &_err)); CL_CHECK(clEnqueueWriteBuffer(queue, clMemBuffer, CL_TRUE, 0, sizeof(float) * NUM_DATA, static_cast<void*>(&dataInput[0]), 0, NULL, NULL)); CL_CHECK(clFinish(queue)); totalBytesGPUMem += sizeof(float) * NUM_DATA; clMemVec[i] = clMemBuffer; printf("Total GPU Buffer Allocated so far %ld bytes (%f MB) ||| Virtual Memory of Process %d MB\n", totalBytesGPUMem, totalBytesGPUMem / (1024.0 * 1024.0), GetProcessVirtualMemoryUsageInMB()); usleep(1000000); } printf("Now deleting GPU Memor allocated\n"); for (int i = 0; i < NUM_CL_MEM_OBJS_CREATED; i++) { CL_CHECK(clReleaseMemObject(clMemVec[i])); printf("Virtual Memory of Process %d MB\n", GetProcessVirtualMemoryUsageInMB()); }

        • clCreateBuffer call and Virtual Memory Size of the process
          genaganna

           

          Originally posted by: shantanu Thanks for reading the post.

          I have experienced that my process virtual memory (VM) size increases in sync with the number of times i call clCreateBuffer ... and decreases once I delete the clMem buffers. In my understanding, I was hoping that the VM size of the process should not be affected by how many times I create GPU buffer ...

          Attached is the partial sample code for illustration. I am running the simple process on Mac platform, and for my case X amount of GPU buffer created results in X amount of increase in my process Virtual Memory size.  I understand this is not Mac forum, but please help me with the general question - "In OpenCL, creating GPU device buffer using clCreateBuffer with CL_MEM_READ_WRITE can affect the VM Size of the process itself?" 

          AMD is not responisble for Mac platform.  I have one question. did you create context with both CPU and GPU devices?

            • clCreateBuffer call and Virtual Memory Size of the process
              shantanu

              Thanks for your response. 

              I pass CL_DEVICE_TYPE_GPU when I make a call to clGetDeviceIDs ... meaning Im only searching for GPUs on the OpenCL platform.

              And then I create a context for one particular GPU device ...

              May I ask what would be the behavior of my query on Windows platform?

              Thanks for your time.

               

                • clCreateBuffer call and Virtual Memory Size of the process
                  genaganna

                   

                  Originally posted by: shantanu Thanks for your response. 

                  I pass CL_DEVICE_TYPE_GPU when I make a call to clGetDeviceIDs ... meaning Im only searching for GPUs on the OpenCL platform. And then I create a context for one particular GPU device ... May I ask what would be the behavior of my query on Windows platform?

                  Thanks for your time.

                   



                  You should get error CL_MEM_OBJECT_ALLOCATION_FAILURE from clCreateBuffer after some time.

                  If you don't use clEnqueueWriteBuffer, more buffers will be created.

                    • clCreateBuffer call and Virtual Memory Size of the process
                      shantanu

                      Yes thats there. I understand that. 

                      But should the Virtual Memory Size of the process (making OpenCL clCreateBuffer calls) increase? Say on Windows platform? 

                        • clCreateBuffer call and Virtual Memory Size of the process
                          genaganna

                           

                          Originally posted by: shantanu Yes thats there. I understand that. 

                          But should the Virtual Memory Size of the process (making OpenCL clCreateBuffer calls) increase? Say on Windows platform? 

                          it won't increase.

                            • clCreateBuffer call and Virtual Memory Size of the process
                              LeeHowes

                              Presumably the runtime will have to allocate staging buffers on the host. 

                              Remember, CL buffers are not allocated on a device. They are allocated in a context. Does your clCreateBuffer call at any point define the device you want to allocate it on? The only hint in that direction is when you enqueue the write operation.

                                • clCreateBuffer call and Virtual Memory Size of the process
                                  shantanu

                                  Yes its correct that CL buffers are allocated in a context (clCreateBuffer call needs a context, and I'm creating the context for one particular GPU device). After creating the buffer im writing data to the buffer using clEnqueueWriteBuffer (I have attached sample code) and at that point the staging buffer on the host should be relinquished automatically?

                                  I guess there is a broader question here that I'm trying to understand. Is it the case that the staging buffer on the host will continue to exist till the point that the clReleaseMemObject call is made - even though you actually write data to the created buffer? Or is the behavior vendor driver dependent.

                                  Thanks

                                    • clCreateBuffer call and Virtual Memory Size of the process
                                      LeeHowes

                                      But you haven't moved the buffer to any device with enqueue write buffer. You just moved it in a queue associated with the device. In any case, if the buffer were relinquished on the host it would be at risk of having an out of memory error if you tried to move it back later. The behaviour of this is entirely runtime and driver dependent. In this case the opencl team here has set things up in the way they feel is most efficient.

                                        • clCreateBuffer call and Virtual Memory Size of the process
                                          shantanu

                                          actually ... as in the attached sample code ... i do a clFinish(queue) just after issuing the clEnqueueWriteBuffer() .. so that should actually do the moving of buffer to the device ...

                                          but i understand that i cant predict or know for sure if the staging buffer on the host will be destroyed and under what circumstances ... so I guess I have to assume that host memory will be released only when I do an explicit clReleaseMemObject.

                                          But then one last thing ... So if my GPU has 2GB of global memory size, then it means that my app on the host, that wants to make use of that 2GB of GPU global memory, needs to have extra virtual memory of that much equivalent on the host side ... if my app is 32 bit, then my virtual memory is limited to 4GB on the host side ... so whats the solution here? move to 64 bit app on the host?

                                          Thanks

                                            • clCreateBuffer call and Virtual Memory Size of the process
                                              LeeHowes

                                              No, you're missing my point. clEnqueueWriteBuffer does NOT enqueue a write to the device. It enqueues a move from your array on the host into the runtime. When it is complete all you have a guarantee of is that the runtime has a copy of your data. You have no idea where it is.

                                              It happens that, sensibly enough, our runtime will copy that to the GPU under most circumstances, but at times it won't make that move until you launch a kernel dependent on the buffer. It's implementation dependent and the runtime has to be able to move the data back to the host on demand if the driver needs the GPU memory for, say, rendering.

                                              I would say you can't guarantee anything will be freed until you release the mem object.