Archives Discussions

thejascr · ‎03-24-2012

Hi,

I have a AMD A8-3850 fusion APU. I have installed the AMD 2.6 SDK and trying out the BufferBandwidth application to check the maximum

bandwidths that I can get on the discrete and the on-die GPU.

I have no issues with the discrete GPU. But for the on-die GPU, this blog

http://blogs.amd.com/developer/2011/08/01/cpu-to-gpu-data-transfers-exceed-15gbs-using-apu-zero-copy...

as well as the AMD APP programmers guide talk of the zero copy path which can reach upto 15 GBps.

Firstly I would like to get it clarified as to whether the API referred to in the blog and the APP guide is "clMapBuffer" or its really "clEnqueueMapBuffer".

Coz there is no such API called clMapBuffer in the library. So I am assuming its clEnqueueMapBuffer.

If that is true, I am trying out BufferBandwidth application that came with the SDK with various options to see If I can get upto 15 GBps.

But so far I have only got upto 8 GBps.

As suggested in the comment of the above blog, I even tried with -nwk as 15. still unable to see any improvement.

here is my command line for buffer size of 128MB.

./BufferBandwidth -t 2 -d 0 -nwk 15 -nl 20 -nr 1 -nk 20 -nb 134217728 -nw 7 -s 2 -if 0 -of 1 -cf 5 -cf 2

and here is the output for Integrated GPU

=======================================================

Device 0 BeaverCreek

Build: DEBUG

GPU work items: 8192

Buffer size: 134217728

CPU workers: 1

Timing loops: 20

Repeats: 1

Kernel loops: 20

inputBuffer: CL_MEM_READ_ONLY

outputBuffer: CL_MEM_WRITE_ONLY

copyBuffer: CL_MEM_READ_WRITECL_MEM_ALLOC_HOST_PTR

Host baseline (single thread, naive):

Timer resolution 256.225 ns

Page fault 2047.7

CPU read 6.16153 GB/s

memcpy() 3.31839 GB/s

memset(,1,) 9.00288 GB/s

memset(,0,) 8.81274 GB/s

AVERAGES (over loops 2 - 19, use -l for complete log)

--------

1. Host mapped write to copyBuffer

clEnqueueMapBuffer(WRITE): 0.000016 s [ 8285.50 GB/s ]

memset(): 0.014970 s 8.97 GB/s

clEnqueueUnmapMemObject(): 0.000084 s [ 1604.42 GB/s ]

2. CL copy of copyBuffer to inputBuffer

clEnqueueCopyBuffer: 0.042168 s 3.18 GB/s

3. GPU kernel read of inputBuffer

clEnqueueNDRangeKernel(): 0.471207 s 5.70 GB/s

verification ok

4. GPU kernel write to outputBuffer

clEnqueueNDRangeKernel(): 0.665229 s 4.04 GB/s

5. CL copy of outputBuffer to copyBuffer

clEnqueueCopyBuffer: 0.041343 s 3.25 GB/s

6. Host mapped read of copyBuffer

clEnqueueMapBuffer(READ): 0.000017 s [ 7710.12 GB/s ]

CPU read: 0.023021 s 5.83 GB/s

verification ok

clEnqueueUnmapMemObject(): 0.000089 s [ 1506.57 GB/s ]

Passed!

Is there anything else that I need to enable in order to get higher bandwidths?

Thanks

-Thejas

nou · ‎03-24-2012

AMD APP SDK doesn't support zero copy on Linux

View solution in original post

nou · ‎03-24-2012

AMD APP SDK doesn't support zero copy on Linux

thejascr · ‎03-24-2012

Oh Thanks so much nou!

So it is actually "clMapbuffer" and not really "clEnqueueMapBuffer" ? ( coz I had thought that was typo)

Thanks again

-Thejas

nou · ‎03-24-2012

there is no clMapBuffer function in OpenCL. IMHO it is shortened in documentation as almost all operations are queued.

EDIT: just correction 7xxx cards should support zero copy on Linux.

thejascr · ‎03-24-2012

Ok.

Thanks once again.

-Thejas

Archives Discussions

Not Seeing High throughputs of the Zero Copy on the APU