31 Replies Latest reply on Aug 4, 2014 2:41 AM by pinform

    Error handling mmap: Memory leak and "GART error"

    ribalda

      We are trying to process images from a framegrabber without copying them to an intermediate buffer. The data appears on a mmaped buffer.

       

      Every time we use the mmaped buffer on OpenCL we are getting an error message on syslog  and the Slab section of /proc/mem_info increases. On our system we are leaking around 1 MB per second!

       

      syslog error: [fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space

       

      In order to demonstrate this error we have made a simple program with a loop that maps and unmaps a mmaped buffer. While running the program please run dmesg and examine /proc/mem_info. The exact program on other OpenCL implementations (ie. Nvidia) works fine. We are using fglrx v 13.4, but the error exist also in previous versions.

       

      After some debugging we believe that it is caused by the function ATI_API_CALL KCL_LockUserPages (firegl_public.c) which calls get_user_pages, that returns -EFAULT  on mmaped or io pages, (and also check for write permissions):

       

      if (!vma ||

          (vma->vm_flags & (VM_IO | VM_PFNMAP)) ||

          !(vm_flags & vma->vm_flags))

      return i ? : -EFAULT;

       

      Please realize that this is a huge restriction for us, who in order to work with images from the framebuffer would have to copy its content to another area with no reason, reducing our framerate dramatically. Also it is incomprehensible why we wont be able to use mmaped buffers.

          • Re: Error handling mmap: Memory leak and "GART error"
            himanshu.gautam

            This was reproducible. I am forwarding it to proper team, after some more investigation.

            One small question, the original code given by you, was giving a compilation error saying "close() was not declared". Any help on how to fix that? Currently running the code after commenting that out.

              • Re: Error handling mmap: Memory leak and "GART error"
                ribalda

                Just add

                 

                #include <unistd.h>

                 

                To the beginning of the source file

                 

                Thanks

                  • Re: Error handling mmap: Memory leak and "GART error"
                    himanshu.gautam

                    @Hi,

                    After more investigation, it looks the issue is invalid. I was not doing dmesg --clear earlier, and therefore reproduced the error log , as it was consistently present in ring_buffer. Secondly the code has a infinite loop, which has to be disabled to run the test to completion. And Thirdly, I do get the "GART ...." error with the current test code, because of the following code being buggy (IMHO)

                    The memory region pointed by pinned_mem has already been added in GPU address space (GART memory). Later we are trying to re-lock it using map-unmap calls.

                    Checkout the attached code for MatrixTranspose Sample, which does not give any GART errors for me.

                      • Re: Error handling mmap: Memory leak and "GART error"
                        himanshu.gautam

                        not able to attach that code, because of some forum issue. Here is the change i did in the original code, after which no GART Errors are observed:

                         

                                //while (1)

                        {

                         

                         

                        #if 1

                                        pinned_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE|CL_MEM_USE_HOST_PTR, BUF_SIZE, pinned_mem, &err);

                                        cl_err_exit(err, "clCreateBuffer");

                        #else

                                        pinned_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, BUF_SIZE, NULL, &err);

                                        cl_err_exit(err, "clCreateBuffer");

                        #endif

                         

                         

                                        device_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY, BUF_SIZE, NULL, NULL);

                                        cl_err_exit(err, "clCreateBuffer");

                         

                         

                                        pinned_mem = (float *) clEnqueueMapBuffer(gpu_queue, pinned_buffer, CL_TRUE, CL_MAP_READ, 0, BUF_SIZE, 0, NULL, NULL, &err);

                                        cl_err_exit(err, "clEnqueueMapBuffer");

                         

                         

                                        //err = clEnqueueWriteBuffer(gpu_queue, pinned_buffer, CL_FALSE, 0, BUF_SIZE, pinned_mem, 0, NULL, NULL);

                                        //cl_err_exit(err, "clEnqueueWriteBuffer");

                         

                         

                                        err = clEnqueueUnmapMemObject(gpu_queue, pinned_buffer, pinned_mem, 0, NULL, NULL);

                                        cl_err_exit(err, "clEnqueueUnmapMemObject");

                         

                         

                                        clFinish(gpu_queue);

                                        clReleaseMemObject(pinned_buffer);

                                        clReleaseMemObject(device_buffer);

                                }

                          • Re: Error handling mmap: Memory leak and "GART error"
                            ribalda

                            Hello Himansh


                            Thanks for your response.


                            About your point 1)

                            I was not doing dmesg --clear earlier, and therefore reproduced the error log , as it was consistently present in ring_buffer

                             

                            I don't understand what do you mean here. Did you run the program we gave you? Did you get the GART error message on dmesg?


                            About your point 2)


                            The reason behind the infinite loop is that you can inspect the Slab counter on /proc/mem_info . I believe that if the implementation is right the Slab size should never increase, even if the program run for days. In our test setup we are losing MegaBYTES per second. It is impossible for us to run of a program for more than a couple of minutes if it uses mmaped memory (framebuffer) , which is clearly wrong, specially when this does not happen with other implementations (ie, nvidia, intel).


                            About your point 3)


                            I dont see why our code is wrong. We NEED to use the data from the framebuffer (or the mmap file), if you disable the CL_MEM_USE_HOST_PTR, the GPU does NOT use the pinned/mmap  memory: Please take a look to clCreateBuffer

                             

                            This flag is valid only if host_ptr is not NULL. If specified, it indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object.

                             

                             

                            OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr in device memory. This cached copy can be used when kernels are executed on a device.

                             

                             

                            The result of OpenCL commands that operate on multiple buffer objects created with the same host_ptr or overlapping host regions is considered to be undefined.

                            .

                            The code that you provide does NOT use the mmaped memory, therefore there is no error.

                             

                             

                            So I would ask you to run the provided code and verify that you see the GART error on dmesg then watch /dev/mem_info and see how the Slab value keeps rising. Otherwise I would love to help you to reproduce the error on your system.

                             

                            Thanks for your help

                              • Re: Error handling mmap: Memory leak and "GART error"
                                himanshu.gautam

                                AMD Runtime pre-pins the buffers automatically. Using UHP should solve your problem. You dont need to lock your pages in memory by yourself (or) Is this not in your hands?

                                  • Re: Error handling mmap: Memory leak and "GART error"
                                    ribalda

                                    It is not in my hands I am trying to use some memory from a framebuffer.  But I am pretty sure the problem is not in the framebuffer I am getting the exact error when I am trying to use a mmaped file, or even when using /dev/mem as input.

                                     

                                    On your code you have disabled the clEnqueueWriteBuffer , so the mmaped data is not copied to the gpu, that is why there is no error. Also without the while (1) it is impossible to measure the memory leak.

                                     

                                    Could you please confirm that you are seeing the gart error on your side, and the increase of the Slab section?

                                     

                                    Regards!

                                    • Re: Error handling mmap: Memory leak and "GART error"
                                      ribalda

                                      Himanshu , Bruhaspati, are you working on this? I still have this issue. I have also posted the problem to the khronos forum and they suggest that it could be an ATI implementation issue

                                        • Re: Error handling mmap: Memory leak and "GART error"
                                          himanshu.gautam

                                          Hi Ricardo,
                                          Sorry for the delay. Busy with many things...

                                          Himanshu may have more to add on this. Have reminded him.

                                          - Bruhaspati

                                          • Re: Re: Error handling mmap: Memory leak and "GART error"
                                            himanshu.gautam

                                            Yeah, I have worked on it, and did reproduced the issue with your code. I wanted to share my code with you once, but the advanced editor is not working

                                            so kept it on hold for sometime. My only doubt was that using mmap(), we are allocating a buffer which may not lie in RAM completely at any time. Now if we create a cl_buffer (using USE_HOST_PTR), that memory region has been blocked to be used my GPU as cl_buffer is created on top of it. It is hard to understand as to what would happen if i create another cl_buffer(using USE_HOST_PTR) with the same memory region. I beleive the clMap->clEnqueueWrite->clUnmap is doing something similar. It may not be a issue with mmap only, and may occur with normal malloc(). Probably you can test that too.

                                            I will attach a test-case explaining my concern more clearly, mostly today.

                                             

                                            EDIT: As per the test, it looks like, the GART error only occurs when mmap is used. PFA the testcase. I am not sure here, about the expected output here. Any suggestions?

                                            Anyways I have forwarded this to Engineering team.

                            • Re: Error handling mmap: Memory leak and "GART error"
                              himanshu.gautam

                              Ricardo,

                              Can you experiment with MAP_SHARED and MAP_PRIVATE - Do they both result in similar behavior.

                              I hope PROT permissions are all fine.

                              -

                              Bruhaspati