3 Replies Latest reply on Aug 18, 2013 11:43 PM by himanshu.gautam

    clEnqueueCopyBuffer() COPY_BUFFER takes an incredibly long time on HD7970, MUCH faster on a HD5970

    cipoint

      I don't get it. My code runs on a machine with a HD7970 so much slower than on a second machine with a HD5970. I found out with CodeXL that it's due to a copy operation, which takes more than 100 times longer on the HD7970. The buffer is only 96KB and the copy operation should be less than one percent of the total execution time, although it occurs very often. But because it takes so long on the HD7970 the whole program slows down to a performance which is inaceptable.

       

      I've writen a small (test just to ensure that it's not an issue of my actual code) which copies a 100KB array to the device, then copies it to another buffer on the device and back to the host (500 times):

       

      int main(int argc, char **argv) {
        int N = 25600;
      
        OpenCLManagement OpenCL = OpenCLManagement(0, 0, true);
      
        float *bufferHb = new float[N];
        cl_mem bufferADb = clCreateBuffer(*OpenCL.getContext(), CL_MEM_READ_WRITE, N * sizeof(float), NULL, NULL);
        cl_mem bufferBDb = clCreateBuffer(*OpenCL.getContext(), CL_MEM_READ_WRITE, N * sizeof(float), NULL, NULL);
      
        for(int n = 0; n < 500; n ++) {
          clEnqueueWriteBuffer(*OpenCL.getQueue(), bufferADb, CL_FALSE, 0, N * sizeof(float), bufferHb, 0, NULL, NULL);
          clEnqueueCopyBuffer(*OpenCL.getQueue(), bufferADb, bufferBDb, 0, 0, N * sizeof(float), 0, NULL, NULL);
          clEnqueueReadBuffer(*OpenCL.getQueue(), bufferBDb, CL_FALSE, 0, N * sizeof(float), bufferHb, 0, NULL, NULL);
        }
      
        clFinish(*OpenCL.getQueue());
      
        clReleaseMemObject(bufferADb);
        clReleaseMemObject(bufferBDb);
      }
      
      
      

       

      The result is:

       

      HD7970:

      write buffer: ~155µs

      read buffer:  ~155µs

      copy buffer: ~120µs

       

      HD5970:

      write buffer: ~250µs

      read buffer:  ~300µs

      copy buffer: ~6µs

       

      I tested also a HD6770 on the second machine results comparable to the HD5970. Since the HD7970 is installed in a remote server and I don't have root acces I can't change much on the system.

       

      What could cause this problem on the server machine? I don't know what driver version is installed there but the APP SDK version is 1.2 (1016.4).

       

      edit: The HD7970 outperforms the HD5970 for buffers of 100MB or more (1µs vs 1.9µs for a copy operation). But for buffers of 10MB or less the HD7970 still is very slow.