Please find attached the clinfo output from our AMD machine, containing AMD Firepro V4800 as a discrete GPU.
The OpenCL rectangular copy function crashes when CPU is used as an OpenCL device on the AMD Fusion APU.
Is there any way to workaround this?
Rectangular-copy from GPU to CPU of data which is not contiguous in memory is very slow. For a rectangle of size 4096x4096, copying the data if it is not contiguous in memory takes 6.7 times the time taken to copy the data when it is contiguous in memory. The same ratio on our NVIDIA Tesla C2050 machine is 1.34.
The results (on NVIDIA Tesla C2050 and AMD Firepro V4800) comparing the performance of rectangular-copy from GPU to CPU for different rectangle sizes, when the data to be copied is contiguous in memory and when it is not, can be found here:
The performance of rectangular-copy from CPU to GPU is similar.
What is the reason for such a huge slowdown in rectangular-copy from GPU to CPU when the data to be copied is not contiguous in memory? The motivation behind using a rectangular-copy is to avoid such a huge slowdown.
Are there any ways in which we can overcome this? Can we improve the performance of copying non-contiguous memory from GPU to CPU (and from CPU to GPU) in some way?
The results would be impacted by the PCI bandwidth at large. Are the two system equivalent in this regard?
Also it would be nice if you can share the code. Not sure what you mean by non-contigous rectangular copy.
clinfo shows it is a dual gpu V4800, won't the two GPU share PCI bandwidth? Probably you should give more information about the process you followed.
We are not comparing the absolute numbers. We are comparing the ratios, i.e., the relative performance of copying contiguous data and of copying non-contiguous data. Why would the PCI bandwidth have an effect?
The code used is straight-forward. Here is an example with more details:
The performance of these two on the same GPU are compared.
We are using only 1 discrete GPU, and the other discrete GPU as well as the APU is idle.