A simple question.
I collect application trace and the stream timeline displayed information as below. (GPU Radeon HD 5870)
Name:12.0 KB READ_BUFFER
Command Type: CL_COMMAND_READ_BUFFER
Queued Time:2704.881 millisecond
Submit Time:2704.891 millisecond
Start Time:2705.210 millisecond
End Time: 2705.295 millisecond
Duration: 85.280 microseconds
clEnqueue API Name: clEnqueueReadBuffer
clEnqueue API Start Time: 2704.875 millisecond
clEnqueue API End Time: 2705.333 millisecond
clEnqueue API Duration: 458.259 microseconds
Transfer Rate: 137.415 MB/s
Transfer Size: 12.000 KB
Why the transfer rate only 137.415 MB/s? Any comment about this?
You should try to copy a larger chunk of data. Also do that multiple times and then calculate the Transfer rate.
There can be a lot of reasons for such transfer rate including GPU warm-up times, memory access pattern, and channel conflicts. You can refer to the GLobal Memory Bandwidth Sample to know how to get better memory transfer rates.