3 Replies Latest reply on Jul 23, 2018 6:44 AM by dipak

    opencl clEnqueueNDRangeKernel caused memory leak

    j0hnny

      Hi, anyone who cares

            I meet with memory leak when calling clEnqueueNDRangeKernel in a deadloop using amd card, below case I used an empty kernel,  but I think this issue is not because of my empty kernel, I ever use matrix add/mul kernel and kernel execution results are right but still meet memory leak.

       

      my code looks as below:

            -----------------------------------------code  start------------------------------------------------------------------

            do {

              for (i = 0; i < ciDeviceCount; ++i) {

                  ciErrNum = clEnqueueNDRangeKernel(commandQueue[i], matrixEqual, 2, 0, globalWorkSize, localWorkSize,

                         0, NULL, &GPUExecution[i]);

                  oclCheckError(ciErrNum, CL_SUCCESS);

                  ciErrNum = clFinish(commandQueue[i]);

                  oclCheckError(ciErrNum, CL_SUCCESS);

              }

          } while(1);

          -----------------------------------------code  end----------------------------------------------------------------------

       

          and my kernel is an empty kernel

       

      -----------------------------------------code  start------------------------------------------------------------------

           __kernel void

           matrixEqual(int m, int n)

           {

               m = n;

           }

        -----------------------------------------code  end----------------------------------------------------------------------

       

           after run many loops, my process's memory consuming rising to 7g and finally killed by the linux kernel.

       

           In /var/log/syslog, it showed that kernel kill the process because out of memory.

           snd kernel: [ 8779.289654] Out of memory: Kill process 5239 (myprocess) score 917 or sacrifice child

       

      OS version:

      Linux snd 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

       

      Part of clinfo:

      Platform Version: OpenCL 2.0 AMD-APP (2482.3)

       

        Platform ID: 0x7f19c67e1098

        Name: Ellesmere

        Vendor: Advanced Micro Devices, Inc.

        Device OpenCL C version: OpenCL C 1.2

        Driver version: 2482.3

        Profile: FULL_PROFILE

        Version: OpenCL 1.2 AMD-APP (2482.3)

       

        Device Type: CL_DEVICE_TYPE_GPU

        Vendor ID: 1002h

        Board name: Radeon RX 570 Series

       

      Hardware

      2 cores Intel(R) Celeron(R) CPU G3930 @ 2.90GHz

      8 cards Radeon RX 570 Series

       

      I also attach the full code to the post.

        • Re: opencl clEnqueueNDRangeKernel caused memory leak
          dipak

          Hi,

          Thank you for reporting this.

          The attached code seems incomplete. Please share a complete repro. Also please mention the driver version (say AMDGPU-Pro X.Y). If the driver is not the latest one, please try the latest driver and share your observation.

           

          From the above code snippet, it looks like you haven't released the event object (GPUExecution[i]) generated against each clEnqueueNDRangeKernel call. It can also cause a memory leak.

           

          Regards,

            • Re: opencl clEnqueueNDRangeKernel caused memory leak
              j0hnny

              Hi, dipak

                    Thanks for you kindly reply.

                  

                    I tried your suggestion,  which remove GPUExecution in  clEnqueueNDRangeKernel but memory leak still happen.

               

                     I wanner try your 2nd suggestion, check and update AMDGPU-Pro version.  Could you please help to tell

                     1. how to check my amd-gpu driver version? It is not installed by me, and dpkg -l amdgpu-pro showed there was no amdgpu-pro installed, but I can see its folder at /opt/amdgpu-pro

                     2. I only found newest amdgpu-pro 18.20 on website Radeon™ Software for Linux® 18.20 Release Notes .  But it said only supporting opencl 1.2,  while my Radeon RX 570 Series card said it supporting opencl 2.0? So where can I find driver supporting opencl 2.0? Do I need install Rocm for opencl 2.0 development.

                      Below is amdgpu-pro 18.2 support list

                    

                                 

                    Below is   RX 570 Series

               

                     Thanks a lot again.

                • Re: opencl clEnqueueNDRangeKernel caused memory leak
                  dipak

                  Currently, AMDGPU-Pro supports OpenCL 1.2 only. That's why RX 570 is listed as OpenCL 1.2 device though it supports OpenCL 2.0.

                  For OpenCL 2.0 kernel programming, you can choose ROCm. It provides OpenCL 2.0 compatible kernel language support with OpenCL 1.2 compatible runtime.

                   

                  Also, from the above clinfo version information, it looks like that the installed driver is an older one. Please install the latest amdgpu-pro 18.20 and check. If the issue is still reproducible, please share a complete repro.

                   

                  Regards,