I have a AMD A8-3850 fusion APU. I have installed the AMD 2.6 SDK and trying out the BufferBandwidth application to check the maximum
bandwidths that I can get on the discrete and the on-die GPU.
I have no issues with the discrete GPU. But for the on-die GPU, this blog
as well as the AMD APP programmers guide talk of the zero copy path which can reach upto 15 GBps.
Firstly I would like to get it clarified as to whether the API referred to in the blog and the APP guide is "clMapBuffer" or its really "clEnqueueMapBuffer".
Coz there is no such API called clMapBuffer in the library. So I am assuming its clEnqueueMapBuffer.
If that is true, I am trying out BufferBandwidth application that came with the SDK with various options to see If I can get upto 15 GBps.
But so far I have only got upto 8 GBps.
As suggested in the comment of the above blog, I even tried with -nwk as 15. still unable to see any improvement.
here is my command line for buffer size of 128MB.
./BufferBandwidth -t 2 -d 0 -nwk 15 -nl 20 -nr 1 -nk 20 -nb 134217728 -nw 7 -s 2 -if 0 -of 1 -cf 5 -cf 2
and here is the output for Integrated GPU
Device 0 BeaverCreek
GPU work items: 8192
Buffer size: 134217728
CPU workers: 1
Timing loops: 20
Kernel loops: 20
Host baseline (single thread, naive):
Timer resolution 256.225 ns
Page fault 2047.7
CPU read 6.16153 GB/s
memcpy() 3.31839 GB/s
memset(,1,) 9.00288 GB/s
memset(,0,) 8.81274 GB/s
AVERAGES (over loops 2 - 19, use -l for complete log)
1. Host mapped write to copyBuffer
clEnqueueMapBuffer(WRITE): 0.000016 s [ 8285.50 GB/s ]
memset(): 0.014970 s 8.97 GB/s
clEnqueueUnmapMemObject(): 0.000084 s [ 1604.42 GB/s ]
2. CL copy of copyBuffer to inputBuffer
clEnqueueCopyBuffer: 0.042168 s 3.18 GB/s
3. GPU kernel read of inputBuffer
clEnqueueNDRangeKernel(): 0.471207 s 5.70 GB/s
4. GPU kernel write to outputBuffer
clEnqueueNDRangeKernel(): 0.665229 s 4.04 GB/s
5. CL copy of outputBuffer to copyBuffer
clEnqueueCopyBuffer: 0.041343 s 3.25 GB/s
6. Host mapped read of copyBuffer
clEnqueueMapBuffer(READ): 0.000017 s [ 7710.12 GB/s ]
CPU read: 0.023021 s 5.83 GB/s
clEnqueueUnmapMemObject(): 0.000089 s [ 1506.57 GB/s ]
Is there anything else that I need to enable in order to get higher bandwidths?
Solved! Go to Solution.
there is no clMapBuffer function in OpenCL. IMHO it is shortened in documentation as almost all operations are queued.
EDIT: just correction 7xxx cards should support zero copy on Linux.