Which GPU are you using?
My copy bandwidth test shows the following info on mobility Redwood (5650)
Device[4] OpenCLPerfCopySpeed[ 0] Passed 11.9252 GBps [Buffer Size = 4 MB] Device[4] OpenCLPerfCopySpeed[ 1] Passed 18.5277 GBps [Buffer Size = 8 MB] Device[4] OpenCLPerfCopySpeed[ 2] Passed 19.3910 GBps [Buffer Size = 16 MB] Device[4] OpenCLPerfCopySpeed[ 3] Passed 19.9996 GBps [Buffer Size = 32 MB] Device[4] OpenCLPerfCopySpeed[ 4] Passed 20.3254 GBps [Buffer Size = 64 MB]
This is what I get on a 4870. Advise - Buy a 5xxx GPU or the North Island series which is going to launch soon
Device[4] OpenCLPerfCopySpeed[ 0] Passed 1.5776 GBps [Buffer Size = 4 MB] Device[4] OpenCLPerfCopySpeed[ 1] Passed 1.6178 GBps [Buffer Size = 8 MB] Device[4] OpenCLPerfCopySpeed[ 2] Passed 1.6439 GBps [Buffer Size = 16 MB] Device[4] OpenCLPerfCopySpeed[ 3] Passed 1.6556 GBps [Buffer Size = 32 MB] Device[4] OpenCLPerfCopySpeed[ 4] Passed 1.6622 GBps [Buffer Size = 64 MB]
If you want to check the bandwidth obtained in kernel copy see MemoryOptimizations sample in SDK 2.2. I have seen low memory transfer speeds on 7xx series GPUs.
well 5xxx GPU have much more attention in OpenCL. in meantime IMHO you should go through memcpy kernel.