cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Tasp
Journeyman III

Writing image to opencl slow

Device: HD5850

cl::Image2D bufferIn(context, CL_MEM_READ_WRITE, SingleIntensity(), bufferDimension, bufferDimension);

...

Profile p;
p.start("upload");
commandQueue.enqueueWriteImage(
                imageBuffer, CL_TRUE,
                o, s, // o = 0,0,0 s = width,height,1
                bufferDimension * sizeof(float), 0,
                pBuffer);
p.stop();



where

SingleIntensity()
is
cl::ImageFormat(CL_INTENSITY, CL_FLOAT)
and pBuffer points to a float[bufferDimension * bufferDimension].

bufferDimension is 5000, and for an image with 3168x4752 it takes 0.233590s.
3168*4752*4 / 0.233590s  = 0.257790762 GB/s

Much too slow?! Which steps would you suggest to find the speed problem? (Ubuntu 10.4, newest SDK/Driver)]

*edit*
speed from sample seems ok.

ati-stream-sdk/samples/opencl/bin/x86_64$ GPU_MAX_HEAP_SIZE=90 ./PCIeBandwidth --device gpu --timing --verify --iterations 5 --length $((3168*4752*4))
Host to device : 2.0552 GB/s
Device to host : 1.03913 GB/s
Passed!


0 Likes
3 Replies
LeeHowes
Staff

Did you warm up the runtime before doing that timing?

Ie do something, another copy, maybe enqueue a kernel. Do a clFinish. Then do the image copy. See if that makes any difference?

0 Likes

No I didn't but i did it now:

Actually I uploaded the image and wrote to the buffer 10 times before profiling it.

These are the results:

Start: upload image
Stop: upload image (0.223629s)

Start: upload buffer
Stop: upload buffer (0.060787s)

Start: copy buffer to image
Stop: copy buffer to image (0.301985s)


code now looks like this:

    for ( ... i < 10 ...) // warmup
    {

        commandQueue.enqueueWriteImage(...);
        commandQueue.enqueueWriteBuffer(...);
    }

    p.start("upload image");
    commandQueue.enqueueWriteImage(
            imageBuffer, CL_TRUE,
            o.get(),
            s.get(),
            0,
            0,
            image.getPointer()
    );
    p.stop();

    p.start("upload buffer");
    commandQueue.enqueueWriteBuffer(
            buffer, CL_TRUE,
            0,
            size,
            image.getPointer()
    );
    p.stop();

    p.start("copy buffer to image");
    commandQueue.enqueueCopyBufferToImage(
            buffer,
            imageBuffer,
            0,
            o.get(),
            s.get()
            );
    commandQueue.finish();
    p.stop();



Image: 3168*4752*4 / 0.223629s  = 0.269273413 GB/s
Buffer: 3168*4752*4 / 0.060787s = 0.990628654 GB/s
Copy Buffer -> Image: 3168*4752*4 / 0.301985s = 0.199405083 GB/s

0 Likes

Those enqueues at the top are blocking? They need to actually push things through the queue rather than just sit waiting. Best to do a finish and wait just after the warmup in code like this.

Other than that maybe there is a problem with image upload performance under linux. I will enquire.

0 Likes