Hi,
I have a AMD A8-3850 fusion APU. I have installed the AMD 2.6 SDK and trying out the BufferBandwidth application to check the maximum
bandwidths that I can get on the discrete and the on-die GPU.
I have no issues with the discrete GPU. But for the on-die GPU, this blog
as well as the AMD APP programmers guide talk of the zero copy path which can reach upto 15 GBps.
Firstly I would like to get it clarified as to whether the API referred to in the blog and the APP guide is "clMapBuffer" or its really "clEnqueueMapBuffer".
Coz there is no such API called clMapBuffer in the library. So I am assuming its clEnqueueMapBuffer.
If that is true, I am trying out BufferBandwidth application that came with the SDK with various options to see If I can get upto 15 GBps.
But so far I have only got upto 8 GBps.
As suggested in the comment of the above blog, I even tried with -nwk as 15. still unable to see any improvement.
here is my command line for buffer size of 128MB.
./BufferBandwidth -t 2 -d 0 -nwk 15 -nl 20 -nr 1 -nk 20 -nb 134217728 -nw 7 -s 2 -if 0 -of 1 -cf 5 -cf 2
and here is the output for Integrated GPU
=======================================================
Device 0 BeaverCreek
Build: DEBUG
GPU work items: 8192
Buffer size: 134217728
CPU workers: 1
Timing loops: 20
Repeats: 1
Kernel loops: 20
inputBuffer: CL_MEM_READ_ONLY
outputBuffer: CL_MEM_WRITE_ONLY
copyBuffer: CL_MEM_READ_WRITECL_MEM_ALLOC_HOST_PTR
Host baseline (single thread, naive):
Timer resolution 256.225 ns
Page fault 2047.7
CPU read 6.16153 GB/s
memcpy() 3.31839 GB/s
memset(,1,) 9.00288 GB/s
memset(,0,) 8.81274 GB/s
AVERAGES (over loops 2 - 19, use -l for complete log)
--------
1. Host mapped write to copyBuffer
clEnqueueMapBuffer(WRITE): 0.000016 s [ 8285.50 GB/s ]
memset(): 0.014970 s 8.97 GB/s
clEnqueueUnmapMemObject(): 0.000084 s [ 1604.42 GB/s ]
2. CL copy of copyBuffer to inputBuffer
clEnqueueCopyBuffer: 0.042168 s 3.18 GB/s
3. GPU kernel read of inputBuffer
clEnqueueNDRangeKernel(): 0.471207 s 5.70 GB/s
verification ok
4. GPU kernel write to outputBuffer
clEnqueueNDRangeKernel(): 0.665229 s 4.04 GB/s
5. CL copy of outputBuffer to copyBuffer
clEnqueueCopyBuffer: 0.041343 s 3.25 GB/s
6. Host mapped read of copyBuffer
clEnqueueMapBuffer(READ): 0.000017 s [ 7710.12 GB/s ]
CPU read: 0.023021 s 5.83 GB/s
verification ok
clEnqueueUnmapMemObject(): 0.000089 s [ 1506.57 GB/s ]
Passed!
Is there anything else that I need to enable in order to get higher bandwidths?
Thanks
-Thejas
Solved! Go to Solution.
AMD APP SDK doesn't support zero copy on Linux
AMD APP SDK doesn't support zero copy on Linux
Oh Thanks so much nou!
So it is actually "clMapbuffer" and not really "clEnqueueMapBuffer" ? ( coz I had thought that was typo)
Thanks again
-Thejas
there is no clMapBuffer function in OpenCL. IMHO it is shortened in documentation as almost all operations are queued.
EDIT: just correction 7xxx cards should support zero copy on Linux.
Ok.
Thanks once again.
-Thejas