3 Replies Latest reply on Sep 27, 2010 1:55 PM by LeeHowes

    Writing image to opencl slow

    Tasp

      Device: HD5850

       

      cl::Image2D bufferIn(context, CL_MEM_READ_WRITE, SingleIntensity(), bufferDimension, bufferDimension);

      ...

      Profile p;
      p.start("upload");
      commandQueue.enqueueWriteImage(
                      imageBuffer, CL_TRUE,
                      o, s, // o = 0,0,0 s = width,height,1
                      bufferDimension * sizeof(float), 0,
                      pBuffer);
      p.stop();



      where

      SingleIntensity()
      is
      cl::ImageFormat(CL_INTENSITY, CL_FLOAT)
      and pBuffer points to a float[bufferDimension * bufferDimension].

      bufferDimension is 5000, and for an image with 3168x4752 it takes 0.233590s.
      3168*4752*4 / 0.233590s  = 0.257790762 GB/s

      Much too slow?! Which steps would you suggest to find the speed problem? (Ubuntu 10.4, newest SDK/Driver)]

      *edit*
      speed from sample seems ok.

      ati-stream-sdk/samples/opencl/bin/x86_64$ GPU_MAX_HEAP_SIZE=90 ./PCIeBandwidth --device gpu --timing --verify --iterations 5 --length $((3168*4752*4))
      Host to device : 2.0552 GB/s
      Device to host : 1.03913 GB/s
      Passed!


        • Writing image to opencl slow
          LeeHowes

          Did you warm up the runtime before doing that timing?

          Ie do something, another copy, maybe enqueue a kernel. Do a clFinish. Then do the image copy. See if that makes any difference?

            • Writing image to opencl slow
              Tasp

              No I didn't but i did it now:

              Actually I uploaded the image and wrote to the buffer 10 times before profiling it.

              These are the results:

               

              Start: upload image
              Stop: upload image (0.223629s)

              Start: upload buffer
              Stop: upload buffer (0.060787s)

              Start: copy buffer to image
              Stop: copy buffer to image (0.301985s)


              code now looks like this:

               

                  for ( ... i < 10 ...) // warmup
                  {

                      commandQueue.enqueueWriteImage(...);
                      commandQueue.enqueueWriteBuffer(...);
                  }

                  p.start("upload image");
                  commandQueue.enqueueWriteImage(
                          imageBuffer, CL_TRUE,
                          o.get(),
                          s.get(),
                          0,
                          0,
                          image.getPointer()
                  );
                  p.stop();

                  p.start("upload buffer");
                  commandQueue.enqueueWriteBuffer(
                          buffer, CL_TRUE,
                          0,
                          size,
                          image.getPointer()
                  );
                  p.stop();

                  p.start("copy buffer to image");
                  commandQueue.enqueueCopyBufferToImage(
                          buffer,
                          imageBuffer,
                          0,
                          o.get(),
                          s.get()
                          );
                  commandQueue.finish();
                  p.stop();



              Image: 3168*4752*4 / 0.223629s  = 0.269273413 GB/s
              Buffer: 3168*4752*4 / 0.060787s = 0.990628654 GB/s
              Copy Buffer -> Image: 3168*4752*4 / 0.301985s = 0.199405083 GB/s

                • Writing image to opencl slow
                  LeeHowes

                  Those enqueues at the top are blocking? They need to actually push things through the queue rather than just sit waiting. Best to do a finish and wait just after the warmup in code like this.

                  Other than that maybe there is a problem with image upload performance under linux. I will enquire.