cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

levyfan
Journeyman III

Possible memory leak

We encounter memory leak problems and the app profiler can't help. (we simply use windows resource monitor.)

A test code sample is given and the code structure is simple:

cl_kernel memset;

cl_command_queue Q1, Q2;

for(;;) {

     cl_event Q1_evt[51];

     cl_event Q2_evt[50];

     clEnqueueNDRangeKernel(Q1, memset, ..., 0, NULL, &Q1_evt[0]);

     for (int iter = 0; iter < 50; iter++) {

         clEnqueueNDRangeKernel(Q2, memset, ..., 1, &Q1_evt[iter], &Q2_evt[iter]);

         clEnqueueNDRangeKernel(Q1, memset, ..., 1, &Q2_evt[iter], &Q1_evt[iter+1]);

     }

     for (int i=0; i<50; i++) {

         clReleaseEvent(Q1_evt);

         clReleaseEvent(Q2_evt);

     }

     clReleaseEvent(Q1_evt[50]);

     clFinish(Q1);

     clFinish(Q2);

}

it is a twisted waiting structure that Q1 waits for Q2 and vise versa.

i guess clReleaseEvent() may not function right, so i create a thread here: http://devgurus.amd.com/message/1283971

my environment:

HD 7970, APP SDK 2.7, Catalyst 12.6 Beta (8.98-120522a-139735E-ATI), win7 64bit

0 Likes
16 Replies
Wenju
Elite

Hi levyfan,

You used for(;;){}, why? And I think the inner loop is ok.

0 Likes

You can change it to something like " for(int k=0; k<1000; k++){} ".

In win7 resource monitor, we find that the program memory usage grows quickly and never goes down.

0 Likes

for(;;) {
  cl_event Q1_evt[51];
  cl_event Q2_evt[50];

  clEnqueueNDRangeKernel(queue, memset, 1, NULL, &n, NULL, 0, NULL, &Q1_evt[0]);
  error = clFlush(queue);
   if (error != CL_SUCCESS)
  {
   printf("clFlush failed err = %d\n", error);
   return(error);
  }
  
  cl_int eventStatus = CL_QUEUED;
  while(eventStatus != CL_COMPLETE)
  {
   error = clGetEventInfo(
                        Q1_evt[0],
                        CL_EVENT_COMMAND_EXECUTION_STATUS,
                        sizeof(cl_int),
                        &eventStatus,
                        NULL);
   if (error != CL_SUCCESS)
   {
    printf("clGetEventInfo failed err = %d\n", error);
    return(error);
   }
  }

  for (int iter = 0; iter < 50; iter++) {

   clEnqueueNDRangeKernel(secondaryQueue, memset, 1, NULL, &n, NULL, NULL, NULL, &Q2_evt[iter]);
   eventStatus = CL_QUEUED;
   while(eventStatus != CL_COMPLETE)
   {
    error = clGetEventInfo(
                        Q2_evt[iter],
                        CL_EVENT_COMMAND_EXECUTION_STATUS,
                        sizeof(cl_int),
                        &eventStatus,
                        NULL);
    if (error != CL_SUCCESS)
    {
     printf("clGetEventInfo failed err = %d\n", error);
     return(error);
    }
   }

   clEnqueueNDRangeKernel(queue, memset, 1, NULL, &n, NULL, 0, NULL, &Q1_evt[iter+1]);
   eventStatus = CL_QUEUED;
   while(eventStatus != CL_COMPLETE)
   {
    error = clGetEventInfo(
                        Q1_evt[iter+1],
                        CL_EVENT_COMMAND_EXECUTION_STATUS,
                        sizeof(cl_int),
                        &eventStatus,
                        NULL);
    if (error != CL_SUCCESS)
    {
     printf("clGetEventInfo failed err = %d\n", error);
     return(error);
    }
   }

  }
  for (int i=0; i<50; i++) {
   clReleaseEvent(Q1_evt);
   clReleaseEvent(Q2_evt);
  }
  clReleaseEvent(Q1_evt[50]);
 
  clFinish(queue);
  clFinish(secondaryQueue);
}

The memory usage is steady.

0 Likes

i see...

the idea of the original code is to flush 51 commands to Q1 and 50 commands to Q2, and then wait for them to be completed.

and in your code, the host execute a kernel and wait for it, and then execute another and wait again.

i guess there could be some leaks when different Queues are synced by opencl events.

0 Likes
levyfan
Journeyman III

so i simplify my code and the memory still leaks:

for(;;) {

    cl_event Q1_evt, Q1_evt_leak;

    cl_event Q2_evt;

    clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);

    clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);

    clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);

    clReleaseEvent(Q1_evt);

    clReleaseEvent(Q2_evt);

    clReleaseEvent(Q1_evt_leak);

    clFinish(Q1);

    clFinish(Q2);

}

and it's quite weird that the memory is stable if the 3rd clEnqueueNDRangeKernel with Q1_evt_leak is not called as follows:

    clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);

    clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);

    clReleaseEvent(Q1_evt);

    clReleaseEvent(Q2_evt);

anyone please?

0 Likes

I'm not sure about this: Maybe after you executed

clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt); 

    clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);  // checkpoint1

and then, you begin run

clReleaseEvent(Q1_evt); 

    clReleaseEvent(Q2_evt);

but maybe the checkpoint1 has not finished yet. Just speculating, you can also  erase the loop to get the profile, whether the release operation is executed after the kernel finished.

0 Likes

it has nothing to do with the checkpoint execution. even if i call clFinish before clReleaseEvent, the memory still leaks:

clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);

clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);

clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);

clFinish(Q1);

clFinish(Q2);

clReleaseEvent(Q1_evt);

clReleaseEvent(Q2_evt);

clReleaseEvent(Q1_evt_leak);

clFinish(Q1);

clFinish(Q2);

0 Likes

for(;;) {
  cl_event Q1_evt, Q1_evt_leak;
  cl_event Q2_evt;

  clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);
  clFinish(Q1);
  clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);
  clFinish(Q2);
  clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);
  clFinish(Q1);

  clReleaseEvent(Q1_evt);
  clReleaseEvent(Q2_evt);
  clReleaseEvent(Q1_evt_leak);

  clFinish(Q1);
  clFinish(Q2);
}

0 Likes

Your code is ok, but that's not what we want.

We provide a test case to demonstrate a possible memory leak bug, and the real code is far more complicated. The main idea is that the host batches lots of gpu kernels to different queues and those kernels are well synced by events. At the same time, the host cpus do lots of other heavy works.

In my test code, the reference count of cl_event Q1_evt is 1 at the end of the loop. That's why there are memory leaks.

So, if we call clReleaseEvent(Q1_evt) twice, the leak is prevented. But this is unacceptable if we modify the real code like that.

0 Likes

Confused!  Last time, I told you use while(eventStatus != CL_COMPLETE), and you said it would failed when using different Queues. But I think it would wok if they have the same context. I'm not sure about this. You can have a try.

0 Likes

hi wenju

you do not get what i meant.

in your code, the host execute a kernel and wait for it, and then execute another and wait again. but in our project, we do not want the host to wait again and again.

so, what you suggest would not help us. your solution just hide the memory leak bug rather than reveal it and fix it.

BTW, what i said "i guess there could be some leaks when different Queues are synced by opencl events.", i meant that the memory leak problem may be related to multiple queues operation in one context.

0 Likes

I see,

cl_event Q1_evt, Q1_evt_leak;
cl_event Q2_evt;
cl_int eventStatus;
for(;;) {
 

  clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);
 
  clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);
 
  clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);
 

   eventStatus = CL_QUEUED;
  while(eventStatus != CL_COMPLETE)
  {
   error = clGetEventInfo(
                        Q1_evt_leak,
                        CL_EVENT_COMMAND_EXECUTION_STATUS,
                        sizeof(cl_int),
                        &eventStatus,
                        NULL);
   if (error != CL_SUCCESS)
   {
    printf("clGetEventInfo failed err = %d\n", error);
    return(error);
   }
  }

  clReleaseEvent(Q1_evt);
  clReleaseEvent(Q2_evt);
  clReleaseEvent(Q1_evt_leak);
  clFinish(Q1);
  clFinish(Q2);
}

no cpu wait, no memory leak, kernels synced by event.

0 Likes

Hi Wenju

There is a dead-lock/loop in your code. You can detect it by adding printf like this:

for(int i; ; i++) {

     printf("%d\n", i);

     ...

     ...

     ...

     printf("%d\n", i);

}

Because of this dead-lock/loop, you will not see the memory grows up.  But in fact, i suppose that the cl_event is never set to CL_COMPLETE and the host is just looping in your while(eventStatus != CL_COMPLETE).

So, your code above reveals another possible bug.

0 Likes
peakitde
Journeyman III

Nice Thanx to share this.

0 Likes
levyfan
Journeyman III

Anyone?

0 Likes
bahamut
Journeyman III

I think the problem is that you've called clReleaseEvent().

You don't need to call it explicitly in that case, it just messes up the reference counter in the event.

When the work items finish, the reference counter in the event gets decremented automatically.

Your code starts the next iteration before the tasks got finished in the previous one.

0 Likes