Hi,
I have a AMD Firepro v4800 and AMD APP SDK 2.6 on linux kernel 3.0.
I am experimenting with the TransferOverlap program of the APP SDK.
But it is not working for me. I see the same running time with both overlap on and off.
I have also run the APP profiler and even there I do not see any overlap.
Here is the program output. (
=======================================================================================
thejascr@amd-fusion:/opt/bk-AMDAPP/samples/opencl/bin/x86_64$ ./TransferOverlap -d 1 -x 1000 -k 100000 -I 3 // (with overlap)
Platform 0 : Advanced Micro Devices, Inc.
Platform found : Advanced Micro Devices, Inc.
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : BeaverCreek Device ID is 0x2652660
Device 1 : Redwood Device ID is 0x2811450
Device 2 : Redwood Device ID is 0x2a573a0
Build: DEBUG
GPU work items: 64
Buffer size: 1024
Timing loops: 50
Kernel loops: 100000
Wavefronts/SIMD: 7
memset/kernel overlap: yes
inputBuffer: CL_MEM_READ_ONLYCL_MEM_ALLOC_HOST_PTR
AVERAGES (over loops3 - 49, use -l to show complete log)
------------------------------------------------------------------------------------------
Acquire and fill: inputBuffer1
clWaitForEvents() + memset() 0.000004 s 0.27 GB/s
clEnqueueUnmapMemObject() 0.154325 s 0.00 GB/s
Loop time 0.309886 s
Launch map: inputBuffer2
clEnqueueMapBuffer(MAP_WRITE) 0.000025 s 0.04 GB/s
Verify: resultBuffer2
clEnqueueMapBuffer(MAP_WRITE) 0.000296 s 0.00 GB/s
CPU reduction 0.000001 s
verification ok
clEnqueueUnmapMemObject() 0.000147 s 0.00 GB/s
Launch GPU kernel: inputBuffer1
clEnqueueNDRangeKernel() 0.000029 s
Acquire and fill: inputBuffer2
clWaitForEvents() + memset() 0.000004 s 0.27 GB/s
clEnqueueUnmapMemObject() 0.154495 s 0.00 GB/s
Launch map: inputBuffer1
clEnqueueMapBuffer(MAP_WRITE) 0.000026 s 0.04 GB/s
Verify: resultBuffer1
clEnqueueMapBuffer(MAP_WRITE) 0.000301 s 0.00 GB/s
CPU reduction 0.000001 s
verification ok
clEnqueueUnmapMemObject() 0.000145 s 0.00 GB/s
Launch GPU kernel: inputBuffer2
clEnqueueNDRangeKernel() 0.000028 s
Complete test time:15.5218 s
====================================================================================
thejascr@amd-fusion:/opt/bk-AMDAPP/samples/opencl/bin/x86_64$ ./TransferOverlap -d 1 -x 1000 -k 100000 -I 3 -n // (without overlap)
Platform 0 : Advanced Micro Devices, Inc.
Platform found : Advanced Micro Devices, Inc.
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : BeaverCreek Device ID is 0x1b20660
Device 1 : Redwood Device ID is 0x1bdca50
Device 2 : Redwood Device ID is 0x1bdbeb0
Build: DEBUG
GPU work items: 64
Buffer size: 1024
Timing loops: 50
Kernel loops: 100000
Wavefronts/SIMD: 7
memset/kernel overlap: no
inputBuffer: CL_MEM_READ_ONLYCL_MEM_ALLOC_HOST_PTR
AVERAGES (over loops3 - 49, use -l to show complete log)
--------------------------------------------------------------------------------------
Acquire and fill: inputBuffer1
clWaitForEvents() + memset() 0.000004 s 0.27 GB/s
clEnqueueUnmapMemObject() 0.000282 s 0.00 GB/s
Loop time 0.310473 s
Launch map: inputBuffer2
clEnqueueMapBuffer(MAP_WRITE) 0.000022 s 0.05 GB/s
Verify: resultBuffer2
clEnqueueMapBuffer(MAP_WRITE) 0.000278 s 0.00 GB/s
CPU reduction 0.000001 s
verification ok
clEnqueueUnmapMemObject() 0.000140 s 0.00 GB/s
Launch GPU kernel: inputBuffer1
clEnqueueNDRangeKernel() 0.154493 s
Acquire and fill: inputBuffer2
clWaitForEvents() + memset() 0.000004 s 0.25 GB/s
clEnqueueUnmapMemObject() 0.000286 s 0.00 GB/s
Launch map: inputBuffer1
clEnqueueMapBuffer(MAP_WRITE) 0.000023 s 0.05 GB/s
Verify: resultBuffer1
clEnqueueMapBuffer(MAP_WRITE) 0.000284 s 0.00 GB/s
CPU reduction 0.000001 s
verification ok
clEnqueueUnmapMemObject() 0.000140 s 0.00 GB/s
Launch GPU kernel: inputBuffer2
clEnqueueNDRangeKernel() 0.154432 s
Complete test time:15.5523 s