Very slow CopyBuffer

Discussion created by Raistmer on Sep 22, 2010
Latest reply on Sep 22, 2010 by nou
How to speedup ?

I need to copy 8MB of data between 2 buffers, both supposed to be on GPU.
I use clEnqueueCopyBuffer for this operation.
But in profiler I see very poor pefrormance of this copy procedure.
It takes >10ms (almost 11ms) to copy 8MB of data from one to another GPU memory location. That is, <1GB/sec . From all I read about GPU capabilities I expect much greater number.
What can be done no speedup memory copy? Should I implement memcpy kernel instead of clEnqueueCopyBuffer function call ?