Is compute + data movement overlap possible with clEnqueueRead/Write() calls?

I have a AMD Firepro v4800 Discrete graphics card and using AMD 2.6 SDK on a linux box running 3.0.0-16-generic linux kernel.

I want to overlap some kernel execution with some data transfer to the GPU.

I see that the SDK samples all use clEnqueueMapBuffer() in the TransferOverlap example.

I tried to do the same with clEnqueueRead/Write with out-of-order queue. But this is not working.

So is it even possible to do this with clEnqueueRead/WriteBuffer() ?


[ I couldnt find anything in the opencl 1.1 spec that

says it should not. I am using CL_FALSE option and synchronized the data movement with events. When I profile and see

the order of command execution I dont see any out-of-orderness even though I see the ENQUEUED event from profiling showing

time before the kernel completion time.]