I'm developing an application that relies heavily on non-blocking buffer reads and writes. I'm running it multiple times without exiting to shell for benchmarking purposes and I've noticed a slight issue.
Whenever a read/write is enqueued in non-blocking mode, it creates an addtional reference to the memory object (as checked by clGetMemObjectInfo). This is to be expected, and a good thing. It keeps you from deallocating the memory while a read is still pending.
However, clWaitForEvents doesn't decrement the reference counter for that object as would be expected (and as performed by the nVidia OpenCL driver). You can manually get around this by calling clReleaseMemObject on the object everytime you use clWaitForEvents, but it's more than mildly inconvenient and rather non-intuitive.
If you're using nonblocking transers and getting apparent memory leaks, this is probably your issue, try calling clGetMemObjectInfo before your final deallocations and checking that the count is 1. If not, there's a rather good chance it'll be 1 + the number of non-blocking reads/writes to that object.
If you've already caught this AMD, then kudos to you! If not, any chance of getting a patch into the next version of Stream?
System Info: (1) Fedora Core 12 64bit, nVidia 256.35 Development Drivers + OpenCL 1.0 CUDA, AMD Catalyst 10.5 Drivers + OpenCL 1.0 ATI-Stream-v2.1 (145), GeForce GTX 480 Radeon 5870, 4x Opteron 2216
(2) Fedora Core 12 64bit, nVidia 256.35 Development Drivers + OpenCL 1.0 CUDA, OpenCL 1.0 ATI-Stream-v2.1 (145), 2x Quadro FX 5600, 4x Opteron 2216