If one creates a context in the application main thread and then calls OpenCL functions utilizing this context from other threads (such as memory allocation/deallocation), OpenCL runtime creates 3 semaphores for each thread but not frees them when this thread is destroyed.
Attached is a sample code illustrating the problem.
Instructions -- on a machine with an AMD GPU (we tested on HD 7970) running Windows 7 x64:
1. Compile the application
2. Run Windows Task Manager
3. Make sure that the Handles column is shown (View > Select Columns..)
Note: it seems that Task Manager on Windows 8 does not have this menu, use other monitoring tools on that OS.
4. Run the application. Hit ENTER several times. Note that every time ENTER is pressed, the number of Handles consumed is increased by 3.
5. Use Windbg from Windows SDK to find out that the leaking handles represent semaphores and inspect the call stack within amdocl[64].dll