Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

clEnqueueCopyBuffer() COPY_BUFFER takes an incredibly long time on HD7970, MUCH faster on a HD5970

I don't get it. My code runs on a machine with a HD7970 so much slower than on a second machine with a HD5970. I found out with CodeXL that it's due to a copy operation, which takes more than 100 times longer on the HD7970. The buffer is only 96KB and the copy operation should be less than one percent of the total execution time, although it occurs very often. But because it takes so long on the HD7970 the whole program slows down to a performance which is inaceptable.

I've writen a small (test just to ensure that it's not an issue of my actual code) which copies a 100KB array to the device, then copies it to another buffer on the device and back to the host (500 times):

int main(int argc, char **argv) {

  int N = 25600;

  OpenCLManagement OpenCL = OpenCLManagement(0, 0, true);

  float *bufferHb = new float;

  cl_mem bufferADb = clCreateBuffer(*OpenCL.getContext(), CL_MEM_READ_WRITE, N * sizeof(float), NULL, NULL);

  cl_mem bufferBDb = clCreateBuffer(*OpenCL.getContext(), CL_MEM_READ_WRITE, N * sizeof(float), NULL, NULL);

  for(int n = 0; n < 500; n ++) {

    clEnqueueWriteBuffer(*OpenCL.getQueue(), bufferADb, CL_FALSE, 0, N * sizeof(float), bufferHb, 0, NULL, NULL);

    clEnqueueCopyBuffer(*OpenCL.getQueue(), bufferADb, bufferBDb, 0, 0, N * sizeof(float), 0, NULL, NULL);

    clEnqueueReadBuffer(*OpenCL.getQueue(), bufferBDb, CL_FALSE, 0, N * sizeof(float), bufferHb, 0, NULL, NULL);






The result is:


write buffer: ~155µs

read buffer:  ~155µs

copy buffer: ~120µs


write buffer: ~250µs

read buffer:  ~300µs

copy buffer: ~6µs

I tested also a HD6770 on the second machine results comparable to the HD5970. Since the HD7970 is installed in a remote server and I don't have root acces I can't change much on the system.

What could cause this problem on the server machine? I don't know what driver version is installed there but the APP SDK version is 1.2 (1016.4).

edit: The HD7970 outperforms the HD5970 for buffers of 100MB or more (1µs vs 1.9µs for a copy operation). But for buffers of 10MB or less the HD7970 still is very slow.

3 Replies

Thanks for reporting the issue. IMHO 1016.4 is Catalyst 12.10, which is close to a year old now. Please install latest catalyst 13.8 beta on 7970 machine (and on 5970 machine). I will try writing a small testcase myself, although if you can provide it, it will speed up bug reproduction and fixing.

Thanks for the reply. I use the newest software on the HD5970 machine (AMD APP 2.8.1 and Catalyst 13.8). Maybe old drivers are indeed the source of the problem on the HD7970 machine. I will contact the admin and report here how the HD7970 performs with newer drivers.


Thanks for the update. Waiting for your latest results now.