cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ribalda
Journeyman III

Error handling mmap: Memory leak and "GART error"

We are trying to process images from a framegrabber without copying them to an intermediate buffer. The data appears on a mmaped buffer.

Every time we use the mmaped buffer on OpenCL we are getting an error message on syslog  and the Slab section of /proc/mem_info increases. On our system we are leaking around 1 MB per second!

syslog error: [fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space

In order to demonstrate this error we have made a simple program with a loop that maps and unmaps a mmaped buffer. While running the program please run dmesg and examine /proc/mem_info. The exact program on other OpenCL implementations (ie. Nvidia) works fine. We are using fglrx v 13.4, but the error exist also in previous versions.

After some debugging we believe that it is caused by the function ATI_API_CALL KCL_LockUserPages (firegl_public.c) which calls get_user_pages, that returns -EFAULT  on mmaped or io pages, (and also check for write permissions):

if (!vma ||

    (vma->vm_flags & (VM_IO | VM_PFNMAP)) ||

    !(vm_flags & vma->vm_flags))

return i ? : -EFAULT;

Please realize that this is a huge restriction for us, who in order to work with images from the framebuffer would have to copy its content to another area with no reason, reducing our framerate dramatically. Also it is incomprehensible why we wont be able to use mmaped buffers.

0 Likes
26 Replies
ribalda
Journeyman III

This problem also seems related: http://devgurus.amd.com/thread/160336

0 Likes

This was reproducible. I am forwarding it to proper team, after some more investigation.

One small question, the original code given by you, was giving a compilation error saying "close() was not declared". Any help on how to fix that? Currently running the code after commenting that out.

0 Likes

Just add

#include <unistd.h>

To the beginning of the source file

Thanks

0 Likes

@Hi,

After more investigation, it looks the issue is invalid. I was not doing dmesg --clear earlier, and therefore reproduced the error log , as it was consistently present in ring_buffer. Secondly the code has a infinite loop, which has to be disabled to run the test to completion. And Thirdly, I do get the "GART ...." error with the current test code, because of the following code being buggy (IMHO)

The memory region pointed by pinned_mem has already been added in GPU address space (GART memory). Later we are trying to re-lock it using map-unmap calls.

Checkout the attached code for MatrixTranspose Sample, which does not give any GART errors for me.

0 Likes

not able to attach that code, because of some forum issue. Here is the change i did in the original code, after which no GART Errors are observed:

        //while (1)

{

#if 1

                pinned_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE|CL_MEM_USE_HOST_PTR, BUF_SIZE, pinned_mem, &err);

                cl_err_exit(err, "clCreateBuffer");

#else

                pinned_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, BUF_SIZE, NULL, &err);

                cl_err_exit(err, "clCreateBuffer");

#endif

                device_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY, BUF_SIZE, NULL, NULL);

                cl_err_exit(err, "clCreateBuffer");

                pinned_mem = (float *) clEnqueueMapBuffer(gpu_queue, pinned_buffer, CL_TRUE, CL_MAP_READ, 0, BUF_SIZE, 0, NULL, NULL, &err);

                cl_err_exit(err, "clEnqueueMapBuffer");

                //err = clEnqueueWriteBuffer(gpu_queue, pinned_buffer, CL_FALSE, 0, BUF_SIZE, pinned_mem, 0, NULL, NULL);

                //cl_err_exit(err, "clEnqueueWriteBuffer");

                err = clEnqueueUnmapMemObject(gpu_queue, pinned_buffer, pinned_mem, 0, NULL, NULL);

                cl_err_exit(err, "clEnqueueUnmapMemObject");

                clFinish(gpu_queue);

                clReleaseMemObject(pinned_buffer);

                clReleaseMemObject(device_buffer);

        }

0 Likes

Hello Himansh


Thanks for your response.


About your point 1)

I was not doing dmesg --clear earlier, and therefore reproduced the error log , as it was consistently present in ring_buffer

I don't understand what do you mean here. Did you run the program we gave you? Did you get the GART error message on dmesg?


About your point 2)


The reason behind the infinite loop is that you can inspect the Slab counter on /proc/mem_info . I believe that if the implementation is right the Slab size should never increase, even if the program run for days. In our test setup we are losing MegaBYTES per second. It is impossible for us to run of a program for more than a couple of minutes if it uses mmaped memory (framebuffer) , which is clearly wrong, specially when this does not happen with other implementations (ie, nvidia, intel).


About your point 3)


I dont see why our code is wrong. We NEED to use the data from the framebuffer (or the mmap file), if you disable the CL_MEM_USE_HOST_PTR, the GPU does NOT use the pinned/mmap  memory: Please take a look to clCreateBuffer


This flag is valid only if host_ptr is not NULL. If specified, it indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object.




OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr in device memory. This cached copy can be used when kernels are executed on a device.




The result of OpenCL commands that operate on multiple buffer objects created with the same host_ptr or overlapping host regions is considered to be undefined.


.

The code that you provide does NOT use the mmaped memory, therefore there is no error.

So I would ask you to run the provided code and verify that you see the GART error on dmesg then watch /dev/mem_info and see how the Slab value keeps rising. Otherwise I would love to help you to reproduce the error on your system.

Thanks for your help

0 Likes

AMD Runtime pre-pins the buffers automatically. Using UHP should solve your problem. You dont need to lock your pages in memory by yourself (or) Is this not in your hands?

0 Likes

It is not in my hands I am trying to use some memory from a framebuffer.  But I am pretty sure the problem is not in the framebuffer I am getting the exact error when I am trying to use a mmaped file, or even when using /dev/mem as input.

On your code you have disabled the clEnqueueWriteBuffer , so the mmaped data is not copied to the gpu, that is why there is no error. Also without the while (1) it is impossible to measure the memory leak.

Could you please confirm that you are seeing the gart error on your side, and the increase of the Slab section?

Regards!

0 Likes

Himanshu , Bruhaspati, are you working on this? I still have this issue. I have also posted the problem to the khronos forum and they suggest that it could be an ATI implementation issue

0 Likes

Hi Ricardo,
Sorry for the delay. Busy with many things...

Himanshu may have more to add on this. Have reminded him.

- Bruhaspati

0 Likes

Yeah, I have worked on it, and did reproduced the issue with your code. I wanted to share my code with you once, but the advanced editor is not working

so kept it on hold for sometime. My only doubt was that using mmap(), we are allocating a buffer which may not lie in RAM completely at any time. Now if we create a cl_buffer (using USE_HOST_PTR), that memory region has been blocked to be used my GPU as cl_buffer is created on top of it. It is hard to understand as to what would happen if i create another cl_buffer(using USE_HOST_PTR) with the same memory region. I beleive the clMap->clEnqueueWrite->clUnmap is doing something similar. It may not be a issue with mmap only, and may occur with normal malloc(). Probably you can test that too.

I will attach a test-case explaining my concern more clearly, mostly today.

EDIT: As per the test, it looks like, the GART error only occurs when mmap is used. PFA the testcase. I am not sure here, about the expected output here. Any suggestions?

Anyways I have forwarded this to Engineering team.

0 Likes

Hello himanshu:


I have tried your example (mmap_gart_error.zip) on my platform and in another pc and both shows the gart error. Since the program only runs for one cycle I could not measure the memory leakage.

In my implementation I can guarantee that a particular sector of the memory is only pointed by one OpenCL buffer, so there should be no problem. In the example I provided (test.c), you can see that at any, any sector is only pointed by one OpenCL buffer.

Please make sure that your Engineer team takes a look to the Gart error AND  ALSO THE MEMORY LEAK. It is impossible to run it with a framegrabber otherwise. And I am sure that it is an application case not only for us, but for other clients.

Regards

0 Likes

I have shared both the testcases on the bug report. And also mentioned about memory leak in your testcase. Hopefully this will be taken care of properly now.

0 Likes

Did you get any feedback? It has been 4 months since I last hear back from you.

0 Likes
himanshu_gautam
Grandmaster

Ricardo,

Can you experiment with MAP_SHARED and MAP_PRIVATE - Do they both result in similar behavior.

I hope PROT permissions are all fine.

-

Bruhaspati

0 Likes

Hello Bruhaspati

I have run the initial experiment (test.c) with  MAP_PRIVATE I get exactly the same results as with MAP_SHARED: Gart error and memory leakage.

But even if it worked I cannot use MAP_PRIVATE . On the doc of mmap you can read:

MAP_PRIVATE

                  Create a private copy-on-write mapping.  Updates to the mapping are  not

                  visible  to  other  processes mapping the same file, and are not carried

                  through to the underlying file.  It is unspecified whether changes  made

                  to the file after the mmap() call are visible in the mapped region.

So it is definitely not an option for a framegrambber.

0 Likes

Oh.. Sorry that It did not work... and Thanks for trying out...

Now, It is upto the engg team to fix this...

We have raised the ticket...So, it should happen down the line.

Thanks for your patience.

If there is any interim update, Himanshu will post it for you.......

-

Bruhaspati

0 Likes

Hello Himanshu , Bruhaspati


Could you check what is the status of this? Can we do anything to help you debugging this? Is there any software you need us to test?


This is a critical functionaly for us. This bug is It is really a blocker for our project.


Thanks again!

0 Likes

I guess it will take some more time as of now.

0 Likes

It has been more than 2 months now. Can we get some kind of feedback?

0 Likes

I have asked for the status.

0 Likes

I has been now 3 MONTHS after the first post and 1 month after you asked for the status. Is there any news?

0 Likes

Hi Ribalda,

I will check this and let you know on Friday.

Meanwhile, Did you check with the latest beta drivers?

Best

Bruhaspati

0 Likes

Exactly the same results:

[   95.303378] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space

[   95.303450] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space

[   95.303515] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space

[   95.303833] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space

0 Likes

Thanks for testing on the latest driver....

I checked internally...and this issue is not resolved yet.

It will take some time...

My best advice to you is:

Don't count on this feature for now (or) near future.

Not many developers are looking for such a feature...(or Are they?)

So, it will take sometime to resolve this.

I have already alerted stakeholders about this issue.

You may have to be paaatttttiiiieeeeennnnnntttttttt.

Thats all I can say for now...

Thanks for reporting this issue and giving us good test-cases!

Thanks,

Best Regards,

Sarnath

0 Likes

The issue was reproducible on Tahiti with the latest driver.  We will keep you updated.

0 Likes