I use clEnqueueRead/WriteBuffer with blocking mode on Radeon HD 5750.
But wrute throughput is lower than result of PCIeSpeedTest(ATI Stream Power Toys).
And read throughput is very lower than write throughput. why ?
size = 1024*1024*64;
NUM_TIMING_LOOPS = 100;
buf = clCreateBuffer(context,CL_MEM_READ_WRITE,size,NULL,&errcode);
stopwatch.start (); // use PerformanceCounter
for (int i = 0; i < NUM_TIMING_LOOPS; i ++)
[ 67108864 bytes] CPU->GPU= 4.851 GB/sec, GPU->CPU= 861.791 MB/sec
This is because of the difference in implementation of PCIeSpeedTest and OpenCL. The PCIe Speedtest goes directly to pinned memory while the OpenCL version copies to PCIe and then to the user memory. We are working on a more optimized path that can avoid this copy under certain conditions in a future release.