cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Very slow CopyBuffer

How to speedup ?

I need to copy 8MB of data between 2 buffers, both supposed to be on GPU.
I use clEnqueueCopyBuffer for this operation.
But in profiler I see very poor pefrormance of this copy procedure.
It takes >10ms (almost 11ms) to copy 8MB of data from one to another GPU memory location. That is, <1GB/sec . From all I read about GPU capabilities I expect much greater number.
What can be done no speedup memory copy? Should I implement memcpy kernel instead of clEnqueueCopyBuffer function call ?
0 Likes
7 Replies
n0thing
Journeyman III

Which GPU are you using?

My copy bandwidth test shows the following info on mobility Redwood (5650)

Device[4] OpenCLPerfCopySpeed[ 0] Passed 11.9252 GBps [Buffer Size = 4 MB] Device[4] OpenCLPerfCopySpeed[ 1] Passed 18.5277 GBps [Buffer Size = 8 MB] Device[4] OpenCLPerfCopySpeed[ 2] Passed 19.3910 GBps [Buffer Size = 16 MB] Device[4] OpenCLPerfCopySpeed[ 3] Passed 19.9996 GBps [Buffer Size = 32 MB] Device[4] OpenCLPerfCopySpeed[ 4] Passed 20.3254 GBps [Buffer Size = 64 MB]

0 Likes
Raistmer
Adept II

As always I use HD4870 GPU And my result not from some benchmark, but from profiling real-world app.
There are few possibilities, of course - maybe profiler just get wrong timings (why then?), maybe buffers were created in suboptimal way (then I would appreciate some ideas how to create them better). In short, I want fast memory transfer in my particular app, not some benchmark that says GPU can do that fast, I know - it can (in benchmark).
0 Likes

This is what I get on a 4870. Advise - Buy a 5xxx GPU or the North Island series which is going to launch soon 

Device[4] OpenCLPerfCopySpeed[ 0] Passed 1.5776 GBps [Buffer Size = 4 MB] Device[4] OpenCLPerfCopySpeed[ 1] Passed 1.6178 GBps [Buffer Size = 8 MB] Device[4] OpenCLPerfCopySpeed[ 2] Passed 1.6439 GBps [Buffer Size = 16 MB] Device[4] OpenCLPerfCopySpeed[ 3] Passed 1.6556 GBps [Buffer Size = 32 MB] Device[4] OpenCLPerfCopySpeed[ 4] Passed 1.6622 GBps [Buffer Size = 64 MB]

0 Likes

If you want to check the bandwidth obtained in kernel copy see MemoryOptimizations sample in SDK 2.2. I have seen low memory transfer speeds on 7xx series GPUs.

0 Likes
Raistmer
Adept II

Thanks.
But your "buy new card" advice not applicable due to specific of my app. It should work on already available GPUs and it should not require user to update GPU. It's very essence of BOINC - use already available resouces , not spend money on some dedicated hardware.
For personal usage I'd rather buy NV GPU
0 Likes
Raistmer
Adept II

BTW, so low speed you see, <2GB/sec - is it hardware limit or just poor OpenCL implementation again? If CAL/IL would be used - speed would remain the same ?
0 Likes

well 5xxx GPU have much more attention in OpenCL. in meantime IMHO you should go through memcpy kernel.

0 Likes