Archives Discussions

thejascr · ‎09-28-2012

Hi,

I have a AMD Firepro v4800 and AMD APP SDK 2.6 on linux kernel 3.0.

I am experimenting with the TransferOverlap program of the APP SDK.

But it is not working for me. I see the same running time with both overlap on and off.

I have also run the APP profiler and even there I do not see any overlap.

Here is the program output. (

=======================================================================================

thejascr@amd-fusion:/opt/bk-AMDAPP/samples/opencl/bin/x86_64$ ./TransferOverlap -d 1 -x 1000 -k 100000 -I 3 // (with overlap)

Platform 0 : Advanced Micro Devices, Inc.

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : BeaverCreek Device ID is 0x2652660

Device 1 : Redwood Device ID is 0x2811450

Device 2 : Redwood Device ID is 0x2a573a0

Build: DEBUG

GPU work items: 64

Buffer size: 1024

Timing loops: 50

Kernel loops: 100000

Wavefronts/SIMD: 7

memset/kernel overlap: yes

inputBuffer: CL_MEM_READ_ONLYCL_MEM_ALLOC_HOST_PTR

AVERAGES (over loops3 - 49, use -l to show complete log)

------------------------------------------------------------------------------------------

Acquire and fill: inputBuffer1

clWaitForEvents() + memset() 0.000004 s 0.27 GB/s

clEnqueueUnmapMemObject() 0.154325 s 0.00 GB/s

Loop time 0.309886 s

Launch map: inputBuffer2

clEnqueueMapBuffer(MAP_WRITE) 0.000025 s 0.04 GB/s

Verify: resultBuffer2

clEnqueueMapBuffer(MAP_WRITE) 0.000296 s 0.00 GB/s

CPU reduction 0.000001 s

verification ok

clEnqueueUnmapMemObject() 0.000147 s 0.00 GB/s

Launch GPU kernel: inputBuffer1

clEnqueueNDRangeKernel() 0.000029 s

Acquire and fill: inputBuffer2

clWaitForEvents() + memset() 0.000004 s 0.27 GB/s

clEnqueueUnmapMemObject() 0.154495 s 0.00 GB/s

Launch map: inputBuffer1

clEnqueueMapBuffer(MAP_WRITE) 0.000026 s 0.04 GB/s

Verify: resultBuffer1

clEnqueueMapBuffer(MAP_WRITE) 0.000301 s 0.00 GB/s

CPU reduction 0.000001 s

verification ok

clEnqueueUnmapMemObject() 0.000145 s 0.00 GB/s

Launch GPU kernel: inputBuffer2

clEnqueueNDRangeKernel() 0.000028 s

Complete test time:15.5218 s

====================================================================================

thejascr@amd-fusion:/opt/bk-AMDAPP/samples/opencl/bin/x86_64$ ./TransferOverlap -d 1 -x 1000 -k 100000 -I 3 -n // (without overlap)

Platform 0 : Advanced Micro Devices, Inc.

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : BeaverCreek Device ID is 0x1b20660

Device 1 : Redwood Device ID is 0x1bdca50

Device 2 : Redwood Device ID is 0x1bdbeb0

Build: DEBUG

GPU work items: 64

Buffer size: 1024

Timing loops: 50

Kernel loops: 100000

Wavefronts/SIMD: 7

memset/kernel overlap: no

inputBuffer: CL_MEM_READ_ONLYCL_MEM_ALLOC_HOST_PTR

AVERAGES (over loops3 - 49, use -l to show complete log)

--------------------------------------------------------------------------------------

Acquire and fill: inputBuffer1

clWaitForEvents() + memset() 0.000004 s 0.27 GB/s

clEnqueueUnmapMemObject() 0.000282 s 0.00 GB/s

Loop time 0.310473 s

Launch map: inputBuffer2

clEnqueueMapBuffer(MAP_WRITE) 0.000022 s 0.05 GB/s

Verify: resultBuffer2

clEnqueueMapBuffer(MAP_WRITE) 0.000278 s 0.00 GB/s

CPU reduction 0.000001 s

verification ok

clEnqueueUnmapMemObject() 0.000140 s 0.00 GB/s

Launch GPU kernel: inputBuffer1

clEnqueueNDRangeKernel() 0.154493 s

Acquire and fill: inputBuffer2

clWaitForEvents() + memset() 0.000004 s 0.25 GB/s

clEnqueueUnmapMemObject() 0.000286 s 0.00 GB/s

Launch map: inputBuffer1

clEnqueueMapBuffer(MAP_WRITE) 0.000023 s 0.05 GB/s

Verify: resultBuffer1

clEnqueueMapBuffer(MAP_WRITE) 0.000284 s 0.00 GB/s

CPU reduction 0.000001 s

verification ok

clEnqueueUnmapMemObject() 0.000140 s 0.00 GB/s

Launch GPU kernel: inputBuffer2

clEnqueueNDRangeKernel() 0.154432 s

Complete test time:15.5523 s

binying · ‎10-02-2012

So you've already found out why...

http://devgurus.amd.com/thread/159775

thejascr · ‎10-02-2012

yep

Archives Discussions

TransferOverlap not working on Firepro v4800 SDK 2.6 Linux 3.0