7 Replies Latest reply on Sep 22, 2010 12:04 PM by nou

    Very slow CopyBuffer

    Raistmer
      How to speedup ?

      I need to copy 8MB of data between 2 buffers, both supposed to be on GPU.
      I use clEnqueueCopyBuffer for this operation.
      But in profiler I see very poor pefrormance of this copy procedure.
      It takes >10ms (almost 11ms) to copy 8MB of data from one to another GPU memory location. That is, <1GB/sec . From all I read about GPU capabilities I expect much greater number.
      What can be done no speedup memory copy? Should I implement memcpy kernel instead of clEnqueueCopyBuffer function call ?
        • Very slow CopyBuffer
          n0thing

          Which GPU are you using?

          My copy bandwidth test shows the following info on mobility Redwood (5650)

          Device[4] OpenCLPerfCopySpeed[ 0] Passed 11.9252 GBps [Buffer Size = 4 MB] Device[4] OpenCLPerfCopySpeed[ 1] Passed 18.5277 GBps [Buffer Size = 8 MB] Device[4] OpenCLPerfCopySpeed[ 2] Passed 19.3910 GBps [Buffer Size = 16 MB] Device[4] OpenCLPerfCopySpeed[ 3] Passed 19.9996 GBps [Buffer Size = 32 MB] Device[4] OpenCLPerfCopySpeed[ 4] Passed 20.3254 GBps [Buffer Size = 64 MB]

          • Very slow CopyBuffer
            Raistmer
            As always I use HD4870 GPU And my result not from some benchmark, but from profiling real-world app.
            There are few possibilities, of course - maybe profiler just get wrong timings (why then?), maybe buffers were created in suboptimal way (then I would appreciate some ideas how to create them better). In short, I want fast memory transfer in my particular app, not some benchmark that says GPU can do that fast, I know - it can (in benchmark).
              • Very slow CopyBuffer
                n0thing

                This is what I get on a 4870. Advise - Buy a 5xxx GPU or the North Island series which is going to launch soon 

                Device[4] OpenCLPerfCopySpeed[ 0] Passed 1.5776 GBps [Buffer Size = 4 MB] Device[4] OpenCLPerfCopySpeed[ 1] Passed 1.6178 GBps [Buffer Size = 8 MB] Device[4] OpenCLPerfCopySpeed[ 2] Passed 1.6439 GBps [Buffer Size = 16 MB] Device[4] OpenCLPerfCopySpeed[ 3] Passed 1.6556 GBps [Buffer Size = 32 MB] Device[4] OpenCLPerfCopySpeed[ 4] Passed 1.6622 GBps [Buffer Size = 64 MB]

              • Very slow CopyBuffer
                Raistmer
                Thanks.
                But your "buy new card" advice not applicable due to specific of my app. It should work on already available GPUs and it should not require user to update GPU. It's very essence of BOINC - use already available resouces , not spend money on some dedicated hardware.
                For personal usage I'd rather buy NV GPU
                • Very slow CopyBuffer
                  Raistmer
                  BTW, so low speed you see, <2GB/sec - is it hardware limit or just poor OpenCL implementation again? If CAL/IL would be used - speed would remain the same ?