Speedy PCIe transfers possible in OpenCL
I am seeing a huge performance difference between the PCIeSpeedTest (v0.2) and any custom OpenCL code I write. Using the PCIeSpeedTest easiliy get close to 6.0 GB/s, while using OpenCL buffers or pinned memory I can't really get past 2.0 GB/s. That is already three times as fast as copying from pagable memory, but not as what I would want to see.
Is there any way to get comparable transfer speeds comparable to the once reached via CAL (PCIeSpeedTest) using OpenCL?