Simultaneous data transfer and kernel execution

Discussion created by tweenk on Apr 3, 2011
Latest reply on Apr 3, 2011 by tweenk

I have a piece of OpenCL code where the data transfer (ReadBuffer / WriteBuffer) takes about the same time as the computation. I would like to allocate 2 input and 2 output buffers and run the kernel on one pair of buffers while I read/write the other pair. Is this possible in AMD's OpenCL implementation?

I tried using an out-of-order queue, but I did not achieve a speedup over a synchronous version.