16 Replies Latest reply on Mar 10, 2013 2:58 PM by bahamut

    Possible memory leak

    levyfan

      We encounter memory leak problems and the app profiler can't help. (we simply use windows resource monitor.)

      A test code sample is given and the code structure is simple:

       

      cl_kernel memset;

      cl_command_queue Q1, Q2;

      for(;;) {

           cl_event Q1_evt[51];

           cl_event Q2_evt[50];

           clEnqueueNDRangeKernel(Q1, memset, ..., 0, NULL, &Q1_evt[0]);

           for (int iter = 0; iter < 50; iter++) {

               clEnqueueNDRangeKernel(Q2, memset, ..., 1, &Q1_evt[iter], &Q2_evt[iter]);

               clEnqueueNDRangeKernel(Q1, memset, ..., 1, &Q2_evt[iter], &Q1_evt[iter+1]);

           }

           for (int i=0; i<50; i++) {

               clReleaseEvent(Q1_evt[i]);

               clReleaseEvent(Q2_evt[i]);

           }

           clReleaseEvent(Q1_evt[50]);

           clFinish(Q1);

           clFinish(Q2);

      }

       

      it is a twisted waiting structure that Q1 waits for Q2 and vise versa.

      i guess clReleaseEvent() may not function right, so i create a thread here: http://devgurus.amd.com/message/1283971

       

      my environment:

      HD 7970, APP SDK 2.7, Catalyst 12.6 Beta (8.98-120522a-139735E-ATI), win7 64bit

        • Re: Possible memory leak
          Wenju

          Hi levyfan,

          You used for(;;){}, why? And I think the inner loop is ok.

            • Re: Possible memory leak
              levyfan

              You can change it to something like " for(int k=0; k<1000; k++){} ".

               

              In win7 resource monitor, we find that the program memory usage grows quickly and never goes down.

                • Re: Possible memory leak
                  Wenju

                  for(;;) {
                    cl_event Q1_evt[51];
                    cl_event Q2_evt[50];

                    clEnqueueNDRangeKernel(queue, memset, 1, NULL, &n, NULL, 0, NULL, &Q1_evt[0]);
                    error = clFlush(queue);
                     if (error != CL_SUCCESS)
                    {
                     printf("clFlush failed err = %d\n", error);
                     return(error);
                    }
                    
                    cl_int eventStatus = CL_QUEUED;
                    while(eventStatus != CL_COMPLETE)
                    {
                     error = clGetEventInfo(
                                          Q1_evt[0],
                                          CL_EVENT_COMMAND_EXECUTION_STATUS,
                                          sizeof(cl_int),
                                          &eventStatus,
                                          NULL);
                     if (error != CL_SUCCESS)
                     {
                      printf("clGetEventInfo failed err = %d\n", error);
                      return(error);
                     }
                    }

                    for (int iter = 0; iter < 50; iter++) {

                     clEnqueueNDRangeKernel(secondaryQueue, memset, 1, NULL, &n, NULL, NULL, NULL, &Q2_evt[iter]);
                     eventStatus = CL_QUEUED;
                     while(eventStatus != CL_COMPLETE)
                     {
                      error = clGetEventInfo(
                                          Q2_evt[iter],
                                          CL_EVENT_COMMAND_EXECUTION_STATUS,
                                          sizeof(cl_int),
                                          &eventStatus,
                                          NULL);
                      if (error != CL_SUCCESS)
                      {
                       printf("clGetEventInfo failed err = %d\n", error);
                       return(error);
                      }
                     }

                     clEnqueueNDRangeKernel(queue, memset, 1, NULL, &n, NULL, 0, NULL, &Q1_evt[iter+1]);
                     eventStatus = CL_QUEUED;
                     while(eventStatus != CL_COMPLETE)
                     {
                      error = clGetEventInfo(
                                          Q1_evt[iter+1],
                                          CL_EVENT_COMMAND_EXECUTION_STATUS,
                                          sizeof(cl_int),
                                          &eventStatus,
                                          NULL);
                      if (error != CL_SUCCESS)
                      {
                       printf("clGetEventInfo failed err = %d\n", error);
                       return(error);
                      }
                     }

                    }
                    for (int i=0; i<50; i++) {
                     clReleaseEvent(Q1_evt[i]);
                     clReleaseEvent(Q2_evt[i]);
                    }
                    clReleaseEvent(Q1_evt[50]);
                   
                    clFinish(queue);
                    clFinish(secondaryQueue);
                  }

                  The memory usage is steady.

                    • Re: Possible memory leak
                      levyfan

                      i see...

                       

                      the idea of the original code is to flush 51 commands to Q1 and 50 commands to Q2, and then wait for them to be completed.

                      and in your code, the host execute a kernel and wait for it, and then execute another and wait again.

                       

                      i guess there could be some leaks when different Queues are synced by opencl events.

                • Re: Possible memory leak
                  levyfan

                  so i simplify my code and the memory still leaks:

                   

                  for(;;) {

                      cl_event Q1_evt, Q1_evt_leak;

                      cl_event Q2_evt;

                      clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);

                      clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);

                      clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);

                      clReleaseEvent(Q1_evt);

                      clReleaseEvent(Q2_evt);

                      clReleaseEvent(Q1_evt_leak);

                      clFinish(Q1);

                      clFinish(Q2);

                  }

                   

                   

                  and it's quite weird that the memory is stable if the 3rd clEnqueueNDRangeKernel with Q1_evt_leak is not called as follows:

                      clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);

                      clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);

                      clReleaseEvent(Q1_evt);

                      clReleaseEvent(Q2_evt);

                   

                  anyone please?

                    • Re: Possible memory leak
                      Wenju

                      I'm not sure about this: Maybe after you executed

                      clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt); 

                          clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);  // checkpoint1

                      and then, you begin run

                      clReleaseEvent(Q1_evt); 

                          clReleaseEvent(Q2_evt);

                      but maybe the checkpoint1 has not finished yet. Just speculating, you can also  erase the loop to get the profile, whether the release operation is executed after the kernel finished.

                        • Re: Possible memory leak
                          levyfan

                          it has nothing to do with the checkpoint execution. even if i call clFinish before clReleaseEvent, the memory still leaks:

                           

                          clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);

                          clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);

                          clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);

                          clFinish(Q1);

                          clFinish(Q2);

                          clReleaseEvent(Q1_evt);

                          clReleaseEvent(Q2_evt);

                          clReleaseEvent(Q1_evt_leak);

                          clFinish(Q1);

                          clFinish(Q2);

                            • Re: Possible memory leak
                              Wenju

                              for(;;) {
                                cl_event Q1_evt, Q1_evt_leak;
                                cl_event Q2_evt;

                                clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);
                                clFinish(Q1);
                                clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);
                                clFinish(Q2);
                                clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);
                                clFinish(Q1);

                                clReleaseEvent(Q1_evt);
                                clReleaseEvent(Q2_evt);
                                clReleaseEvent(Q1_evt_leak);

                                clFinish(Q1);
                                clFinish(Q2);
                              }

                                • Re: Possible memory leak
                                  levyfan

                                  Your code is ok, but that's not what we want.

                                  We provide a test case to demonstrate a possible memory leak bug, and the real code is far more complicated. The main idea is that the host batches lots of gpu kernels to different queues and those kernels are well synced by events. At the same time, the host cpus do lots of other heavy works.

                                   

                                  In my test code, the reference count of cl_event Q1_evt is 1 at the end of the loop. That's why there are memory leaks.

                                  So, if we call clReleaseEvent(Q1_evt) twice, the leak is prevented. But this is unacceptable if we modify the real code like that.

                                    • Re: Possible memory leak
                                      Wenju

                                      Confused!  Last time, I told you use while(eventStatus != CL_COMPLETE), and you said it would failed when using different Queues. But I think it would wok if they have the same context. I'm not sure about this. You can have a try.

                                        • Re: Possible memory leak
                                          levyfan

                                          hi wenju

                                           

                                          you do not get what i meant.

                                          in your code, the host execute a kernel and wait for it, and then execute another and wait again. but in our project, we do not want the host to wait again and again.

                                           

                                          so, what you suggest would not help us. your solution just hide the memory leak bug rather than reveal it and fix it.

                                           

                                          BTW, what i said "i guess there could be some leaks when different Queues are synced by opencl events.", i meant that the memory leak problem may be related to multiple queues operation in one context.

                                            • Re: Possible memory leak
                                              Wenju

                                              I see,

                                              cl_event Q1_evt, Q1_evt_leak;
                                              cl_event Q2_evt;
                                              cl_int eventStatus;
                                              for(;;) {
                                               

                                                clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 0, NULL, &Q1_evt);
                                               
                                                clEnqueueNDRangeKernel(Q2, memset, 1, NULL, &N, NULL, 1, &Q1_evt, &Q2_evt);
                                               
                                                clEnqueueNDRangeKernel(Q1, memset, 1, NULL, &N, NULL, 1, &Q2_evt, &Q1_evt_leak);
                                               

                                                 eventStatus = CL_QUEUED;
                                                while(eventStatus != CL_COMPLETE)
                                                {
                                                 error = clGetEventInfo(
                                                                      Q1_evt_leak,
                                                                      CL_EVENT_COMMAND_EXECUTION_STATUS,
                                                                      sizeof(cl_int),
                                                                      &eventStatus,
                                                                      NULL);
                                                 if (error != CL_SUCCESS)
                                                 {
                                                  printf("clGetEventInfo failed err = %d\n", error);
                                                  return(error);
                                                 }
                                                }

                                                clReleaseEvent(Q1_evt);
                                                clReleaseEvent(Q2_evt);
                                                clReleaseEvent(Q1_evt_leak);
                                                clFinish(Q1);
                                                clFinish(Q2);
                                              }

                                              no cpu wait, no memory leak, kernels synced by event.

                                                • Re: Possible memory leak
                                                  levyfan

                                                  Hi Wenju

                                                   

                                                  There is a dead-lock/loop in your code. You can detect it by adding printf like this:

                                                  for(int i; ; i++) {

                                                       printf("%d\n", i);

                                                       ...

                                                       ...

                                                       ...

                                                       printf("%d\n", i);

                                                  }

                                                   

                                                  Because of this dead-lock/loop, you will not see the memory grows up.  But in fact, i suppose that the cl_event is never set to CL_COMPLETE and the host is just looping in your while(eventStatus != CL_COMPLETE).

                                                   

                                                  So, your code above reveals another possible bug.

                                  • Re: Possible memory leak
                                    peakitde

                                    Nice Thanx to share this.

                                    • Re: Possible memory leak
                                      levyfan

                                      Anyone?

                                      • Re: Possible memory leak
                                        bahamut

                                        I think the problem is that you've called clReleaseEvent().

                                        You don't need to call it explicitly in that case, it just messes up the reference counter in the event.

                                        When the work items finish, the reference counter in the event gets decremented automatically.

                                        Your code starts the next iteration before the tasks got finished in the previous one.