When I run my application, I first queue up around 50 sets of kernels, each set containing around 10 kernels.
The queued kernels wait for a user event before beginning. I am finding that simply queuing the kernels into OpenCL
queues eats up around 1.5 GB of host memory, and even after the kernels have been executed, the memory does not
get cleaned up.
How can I trouble shoot this issue? And why does the queue eat up so much memory? Each set of kernels waits for a host to device
transfer of a 9 MB buffer before they execute, but I maintain a pool of these buffers, so only a handful are allocated.
Solved! Go to Solution.
OK, it was my mistake. I had a memory leak in my application. All is working fine now that I plugged that leak.
Thanks for your help.
Hi,
Actually, the runtime maintains the host-side queues and uses the required system resources to hold the enqueued commands. Due to deferred allocation policy, the runtime may consume some extra space to copy the data into a temporary runtime buffer. Additional to that, the runtime maintains a resource cache which may consume some memory even after releasing the memory objects (OpenCL memory buffer and image less than 64 Mb not releasing? ). The actual amount of resource consumption depends on the driver and particular application scenario. From your description, its difficult to say whether the above behaviour is expected or not in your case. However, as you feel that its somewhat unexpected, it would be helpful for us if you could provide a reproducible test-case manifesting the above problem.
Regards,
Thanks, Dipak. It will take me some time to create a reproducer, but let me rephrase the question: My application works as follows: queue up N kernels, wait for them to complete, then queue up another N kernels, wait for them to complete, and so on. If each time I queue N kernels, the runtime eats X amount of memory, and doesn't release it, then I have a problem, clearly. So, if the driver is working correctly, I assume that at some point in time, the driver will either release or re-use the memory resources it has allocated. I will test my application to see if I run out of memory, or if the driver's resource usage stabilizes over time.
However, as the other poster stated, it would be useful to control peak host memory usage for the driver. Because, some people need to do this, even if it means sacrificing some performance.
No problem. When you've, please provide us.
Yes, you are right. If all the resources are released properly and there is no memory leak, then you should not observe a constant increase in resource usage. If so, then there is indeed some issue. Please make sure, at application level, you're properly releasing the memory after each iteration.
I agree that controlling the peak host memory usage may be useful for certain scenarios. But, right now, no such controlling functionality is exposed at application or user level.
Regards,
Thanks. Are there tools to detect resource leaks, for example cl events that have not been released ?
You may try CodeXL.
Thanks, I will try this.
So, I did an experiment, and simply increased the size of N in my batches of enqueued kernels.
The memory usage grows linearly with number of kernels enqueued.
It doesn't look like the driver has any cap on how much memory it uses just to enqueue kernels.
Ideally, if many kernels are being enqueued, resources should only be allocated for at most K
kernels, to limit memory usage.
The driver will happily allocate all of my available system RAM, in this case.
I would call this a bug - what do you think?
OK, it was my mistake. I had a memory leak in my application. All is working fine now that I plugged that leak.
Thanks for your help.
Good to hear that you've found the problem and it's working fine now.