According to the blogpost CPU-to-GPU data transfers exceed 15GB/s using APU zero copy path | AMD one can achieve around 15 GB/s on an APU. I tried this on a Llano machine with A8-3500M and DDR3 1333 MHz (dual-channel) 8GB. I tried the BufferBandwidth sample as follows:
BufferBandwidth.exe -if 0 -if 5 -nwk 4
This yields around 14GB/s of bandwidth for the copy operation as expected on some operations. However, if I run the binary multiple times, sometimes the performance drops a LOT. The Map and Unmap operations suddenly start taking up a lot of time and the copy bandwidth also drops. This is inconsistent across runs. Sometimes I get the high bandwidth as expected, sometimes I get very low bandwidth. Any idea why this might be the case?