First of all my specs: ATI 5770 and OpenCL 2.01 implementation.
My host code makes a lot of iterations. During each iteration, the host code creates various buffer objects and among these buffers there's particularly one, let's call it outputKeyIntersections, that can be very big in size, depending on various conditions. This buffer isn't written nor readed from host to global memory, it is only used by kernels to the purpose of writing and reading temporary data. After creating all the buffers and other OpenCL stuff, two kernels are executed.
Here's the problem: when the size required for this particular buffer tends to be very big among the iterations (let's say, for example, 200Mbytes on more than 2-3 iterations), at some precise iteration the program gives me an error code "-4" after trying to create outputKeyIntersections with clCreateObject, which corresponds to a "CL_MEM_OBJECT_ALLOCATION_FAILURE".
In theory this is impossibile, because the other buffers created don't occupy all the memory in the videocard (when all summed, they occupy at maximum 300-400M, but on 5770 there's 1GB available, so there shouldn't be any problem...).
I'm starting to think that buffers aren't released properly...maybe there's something not working properly when calling clReleaseMemObject? Maybe it doesn't release all the memory occupied by the buffers?