I'm trying to implement exactly the same scenario. It works quite well in CUDA, but I can't make it work with an AMD GPU (5850). Probably I'm doing something wrong (e. g. not calling clFlush in the right moments). I have also heard that overlapping is automatically disable in profiler, is it correct? Will it also be disabled in queues created with CL_QUEUE_PROFILING_ENABLE?
Do you have any working code samples of such a pipeline? Also, with SDK 2.7, is it still required to set GPU_ASYNC_MEM_COPY to 2?
I have the same question as Tim, but i'm using 7970. BTW, can Read/Write Rect functions also be overlapped? It will help if you can provide some working codes examples.