Hi,
I am using AMD Radeon HD 6770 Graphics Card and installed AMD SDKv2.7 & Catalyst Driver12.8 on RHEL6.0 . While running the "FFT" sample program on cpu device (Intel Core2 Duo) and gpu device with timing option , on cpu device the program is running faster than on gpu .
[root@localhost x86]# ./FFT -t
Platform 0 : Advanced Micro Devices, Inc.
Original Input Real
15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168
Original Input Img
0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.0443163 0.879562
Platform found : Advanced Micro Devices, Inc.
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Juniper Device ID is 0x9bee888
Build Options are : -x clc++
Executing kernel for 1 iterations
-------------------------------------------
Output real
131643 -1085.95 -997.15 -1791.52 532.118 1659.74 -166.271 969.692 1189.76 -862.707
Output img
514.23 2289.84 936.489 -603.839 699.7 1018.18 1900.06 795.439 -1328.03 -293.334
Length Time(sec) [Transfer+Kernel]Time(sec)
26214400 0.932 0.492
And with CPU ,
[root@localhost x86]# ./FFT -t --device cpu
Platform 0 : Advanced Micro Devices, Inc.
Original Input Real
15.3732 201.81 51.9855 89.2322 92.572 34.4675 96.2478 66.3863 11.345 225.168
Original Input Img
0.0600514 0.788318 0.203068 0.348563 0.361609 0.134639 0.375968 0.259322 0.0443163 0.879562
Platform found : Advanced Micro Devices, Inc.
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz Device ID is 0x8d62ab8
Build Options are : -x clc++
Executing kernel for 1 iterations
-------------------------------------------
Output real
131643 -1085.95 -997.15 -1791.52 532.119 1659.74 -166.27 969.692 1189.76 -862.707
Output img
514.23 2289.84 936.49 -603.839 699.7 1018.18 1900.06 795.439 -1328.03 -293.334
Length Time(sec) [Transfer+Kernel]Time(sec)
26214400 0.695 0.294
How can I increase performance with GPU ?
Thanks
Hi,
D:\Documents\AMD APP\samples\opencl\bin\debug\x86>"D:\Documents\AMD APP\samples\
opencl\bin\debug\x86\FFT.exe" -t
Platform 0 : Advanced Micro Devices, Inc.
Original Input Real
3.4375 148.852 180.273 102.375 252.836 25.5078 240.227 255.297 76.25 219.25
Original Input Img
0.0134277 0.581451 0.704193 0.399902 0.98764 0.0996399 0.938385 0.997253 0.29785
2 0.856445
Platform found : Advanced Micro Devices, Inc.
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Redwood Device ID is 007006A8
Build Options are : -x clc++
Executing kernel for 1 iterations
-------------------------------------------
Output real
134794 -1070.3 -433.119 -611.422 3148.27 3127.49 634.976 2093.02 -2744.55 -1223.
81
Output img
526.538 1576.56 344.663 -721.792 1204.88 -1340.41 618.896 253.97 1077.52 144.054
Length Time(sec) [Transfer+Kernel]Time(sec)
1024 0.381548 0.00284864
D:\Documents\AMD APP\samples\opencl\bin\debug\x86>"D:\Documents\AMD APP\samples\
opencl\bin\debug\x86\FFT.exe" -t --device cpu
Platform 0 : Advanced Micro Devices, Inc.
Original Input Real
3.4375 148.852 180.273 102.375 252.836 25.5078 240.227 255.297 76.25 219.25
Original Input Img
0.0134277 0.581451 0.704193 0.399902 0.98764 0.0996399 0.938385 0.997253 0.29785
2 0.856445
Platform found : Advanced Micro Devices, Inc.
Selected Platform Vendor : Advanced Micro Devices, Inc.
Device 0 : Intel(R) Core(TM) i5 CPU M 480 @ 2.67GHz Device ID is 0283E778
Build Options are : -x clc++
Executing kernel for 1 iterations
-------------------------------------------
Output real
134794 -1070.3 -433.118 -611.422 3148.27 3127.49 634.976 2093.02 -2744.55 -1223.
81
Output img
526.538 1576.56 344.663 -721.792 1204.88 -1340.41 618.897 253.97 1077.52 144.054
Length Time(sec) [Transfer+Kernel]Time(sec)
1024 0.574003 0.000512755
It performs normally in my computer.
Hi,
FFT c code runs faster than FFT code with AMD fft libraries on CPU And on GPU OpenCL code is taking much more time .Can anyone please tell me the reason for this?
Thanks
"FFT c code runs faster than FFT code with AMD fft libraries on CPU And on GPU OpenCL code is taking much more time .Can anyone please tell me the reason for this?"
--When you increase the job size?
With 4 point fft the C code is running faster than opencl . Should i increase the size ?
like 512 point fft...