We are trying to process images from a framegrabber without copying them to an intermediate buffer. The data appears on a mmaped buffer.
Every time we use the mmaped buffer on OpenCL we are getting an error message on syslog and the Slab section of /proc/mem_info increases. On our system we are leaking around 1 MB per second!
syslog error: [fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space
In order to demonstrate this error we have made a simple program with a loop that maps and unmaps a mmaped buffer. While running the program please run dmesg and examine /proc/mem_info. The exact program on other OpenCL implementations (ie. Nvidia) works fine. We are using fglrx v 13.4, but the error exist also in previous versions.
After some debugging we believe that it is caused by the function ATI_API_CALL KCL_LockUserPages (firegl_public.c) which calls get_user_pages, that returns -EFAULT on mmaped or io pages, (and also check for write permissions):
if (!vma ||
(vma->vm_flags & (VM_IO | VM_PFNMAP)) ||
!(vm_flags & vma->vm_flags))
return i ? : -EFAULT;
Please realize that this is a huge restriction for us, who in order to work with images from the framebuffer would have to copy its content to another area with no reason, reducing our framerate dramatically. Also it is incomprehensible why we wont be able to use mmaped buffers.
This problem also seems related: http://devgurus.amd.com/thread/160336
This was reproducible. I am forwarding it to proper team, after some more investigation.
One small question, the original code given by you, was giving a compilation error saying "close() was not declared". Any help on how to fix that? Currently running the code after commenting that out.
Just add
#include <unistd.h>
To the beginning of the source file
Thanks
@Hi,
After more investigation, it looks the issue is invalid. I was not doing dmesg --clear earlier, and therefore reproduced the error log , as it was consistently present in ring_buffer. Secondly the code has a infinite loop, which has to be disabled to run the test to completion. And Thirdly, I do get the "GART ...." error with the current test code, because of the following code being buggy (IMHO)
The memory region pointed by pinned_mem has already been added in GPU address space (GART memory). Later we are trying to re-lock it using map-unmap calls.
Checkout the attached code for MatrixTranspose Sample, which does not give any GART errors for me.
not able to attach that code, because of some forum issue. Here is the change i did in the original code, after which no GART Errors are observed:
//while (1)
{
#if 1
pinned_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE|CL_MEM_USE_HOST_PTR, BUF_SIZE, pinned_mem, &err);
cl_err_exit(err, "clCreateBuffer");
#else
pinned_buffer = clCreateBuffer(context, CL_MEM_READ_WRITE, BUF_SIZE, NULL, &err);
cl_err_exit(err, "clCreateBuffer");
#endif
device_buffer = clCreateBuffer(context, CL_MEM_READ_ONLY, BUF_SIZE, NULL, NULL);
cl_err_exit(err, "clCreateBuffer");
pinned_mem = (float *) clEnqueueMapBuffer(gpu_queue, pinned_buffer, CL_TRUE, CL_MAP_READ, 0, BUF_SIZE, 0, NULL, NULL, &err);
cl_err_exit(err, "clEnqueueMapBuffer");
//err = clEnqueueWriteBuffer(gpu_queue, pinned_buffer, CL_FALSE, 0, BUF_SIZE, pinned_mem, 0, NULL, NULL);
//cl_err_exit(err, "clEnqueueWriteBuffer");
err = clEnqueueUnmapMemObject(gpu_queue, pinned_buffer, pinned_mem, 0, NULL, NULL);
cl_err_exit(err, "clEnqueueUnmapMemObject");
clFinish(gpu_queue);
clReleaseMemObject(pinned_buffer);
clReleaseMemObject(device_buffer);
}
Hello Himansh
Thanks for your response.
About your point 1)
I was not doing dmesg --clear earlier, and therefore reproduced the error log , as it was consistently present in ring_buffer
I don't understand what do you mean here. Did you run the program we gave you? Did you get the GART error message on dmesg?
About your point 2)
The reason behind the infinite loop is that you can inspect the Slab counter on /proc/mem_info . I believe that if the implementation is right the Slab size should never increase, even if the program run for days. In our test setup we are losing MegaBYTES per second. It is impossible for us to run of a program for more than a couple of minutes if it uses mmaped memory (framebuffer) , which is clearly wrong, specially when this does not happen with other implementations (ie, nvidia, intel).
About your point 3)
I dont see why our code is wrong. We NEED to use the data from the framebuffer (or the mmap file), if you disable the CL_MEM_USE_HOST_PTR, the GPU does NOT use the pinned/mmap memory: Please take a look to clCreateBuffer
This flag is valid only if host_ptr is not NULL. If specified, it indicates that the application wants the OpenCL implementation to use memory referenced by host_ptr as the storage bits for the memory object.
OpenCL implementations are allowed to cache the buffer contents pointed to by host_ptr in device memory. This cached copy can be used when kernels are executed on a device.
The result of OpenCL commands that operate on multiple buffer objects created with the same host_ptr or overlapping host regions is considered to be undefined.
.
The code that you provide does NOT use the mmaped memory, therefore there is no error.
So I would ask you to run the provided code and verify that you see the GART error on dmesg then watch /dev/mem_info and see how the Slab value keeps rising. Otherwise I would love to help you to reproduce the error on your system.
Thanks for your help
AMD Runtime pre-pins the buffers automatically. Using UHP should solve your problem. You dont need to lock your pages in memory by yourself (or) Is this not in your hands?
It is not in my hands I am trying to use some memory from a framebuffer. But I am pretty sure the problem is not in the framebuffer I am getting the exact error when I am trying to use a mmaped file, or even when using /dev/mem as input.
On your code you have disabled the clEnqueueWriteBuffer , so the mmaped data is not copied to the gpu, that is why there is no error. Also without the while (1) it is impossible to measure the memory leak.
Could you please confirm that you are seeing the gart error on your side, and the increase of the Slab section?
Regards!
Himanshu , Bruhaspati, are you working on this? I still have this issue. I have also posted the problem to the khronos forum and they suggest that it could be an ATI implementation issue
Hi Ricardo,
Sorry for the delay. Busy with many things...
Himanshu may have more to add on this. Have reminded him.
- Bruhaspati
Yeah, I have worked on it, and did reproduced the issue with your code. I wanted to share my code with you once, but the advanced editor is not working
so kept it on hold for sometime. My only doubt was that using mmap(), we are allocating a buffer which may not lie in RAM completely at any time. Now if we create a cl_buffer (using USE_HOST_PTR), that memory region has been blocked to be used my GPU as cl_buffer is created on top of it. It is hard to understand as to what would happen if i create another cl_buffer(using USE_HOST_PTR) with the same memory region. I beleive the clMap->clEnqueueWrite->clUnmap is doing something similar. It may not be a issue with mmap only, and may occur with normal malloc(). Probably you can test that too.
I will attach a test-case explaining my concern more clearly, mostly today.
EDIT: As per the test, it looks like, the GART error only occurs when mmap is used. PFA the testcase. I am not sure here, about the expected output here. Any suggestions?
Anyways I have forwarded this to Engineering team.
Hello himanshu:
I have tried your example (mmap_gart_error.zip) on my platform and in another pc and both shows the gart error. Since the program only runs for one cycle I could not measure the memory leakage.
In my implementation I can guarantee that a particular sector of the memory is only pointed by one OpenCL buffer, so there should be no problem. In the example I provided (test.c), you can see that at any, any sector is only pointed by one OpenCL buffer.
Please make sure that your Engineer team takes a look to the Gart error AND ALSO THE MEMORY LEAK. It is impossible to run it with a framegrabber otherwise. And I am sure that it is an application case not only for us, but for other clients.
Regards
I have shared both the testcases on the bug report. And also mentioned about memory leak in your testcase. Hopefully this will be taken care of properly now.
Did you get any feedback? It has been 4 months since I last hear back from you.
Ricardo,
Can you experiment with MAP_SHARED and MAP_PRIVATE - Do they both result in similar behavior.
I hope PROT permissions are all fine.
-
Bruhaspati
Hello Bruhaspati
I have run the initial experiment (test.c) with MAP_PRIVATE I get exactly the same results as with MAP_SHARED: Gart error and memory leakage.
But even if it worked I cannot use MAP_PRIVATE . On the doc of mmap you can read:
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the mapping are not
visible to other processes mapping the same file, and are not carried
through to the underlying file. It is unspecified whether changes made
to the file after the mmap() call are visible in the mapped region.
So it is definitely not an option for a framegrambber.
Oh.. Sorry that It did not work... and Thanks for trying out...
Now, It is upto the engg team to fix this...
We have raised the ticket...So, it should happen down the line.
Thanks for your patience.
If there is any interim update, Himanshu will post it for you.......
-
Bruhaspati
Hello Himanshu , Bruhaspati
Could you check what is the status of this? Can we do anything to help you debugging this? Is there any software you need us to test?
This is a critical functionaly for us. This bug is It is really a blocker for our project.
Thanks again!
I guess it will take some more time as of now.
It has been more than 2 months now. Can we get some kind of feedback?
I have asked for the status.
I has been now 3 MONTHS after the first post and 1 month after you asked for the status. Is there any news?
Hi Ribalda,
I will check this and let you know on Friday.
Meanwhile, Did you check with the latest beta drivers?
Best
Bruhaspati
Exactly the same results:
[ 95.303378] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space
[ 95.303450] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space
[ 95.303515] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space
[ 95.303833] <3>[fglrx:MCIL_LockMemory] *ERROR* Could not lock memory into GART space
Thanks for testing on the latest driver....
I checked internally...and this issue is not resolved yet.
It will take some time...
My best advice to you is:
Don't count on this feature for now (or) near future.
Not many developers are looking for such a feature...(or Are they?)
So, it will take sometime to resolve this.
I have already alerted stakeholders about this issue.
You may have to be paaatttttiiiieeeeennnnnntttttttt.
Thats all I can say for now...
Thanks for reporting this issue and giving us good test-cases!
Thanks,
Best Regards,
Sarnath
The issue was reproducible on Tahiti with the latest driver. We will keep you updated.