Using (with a Radeon 4870):
./simple_matmult_d -x 1000 -y 1000 -i 100 -t -p
I got:
Width Height Iterations CPU Total Time GPU Total Time
1000 1000 100 876.598000 10.756000
Gflops Speedup
17.317266 81.498512
Are these numbers consistent with other's results? (i.e., Is this what I should expect?)
---jski
Results using (double precision):
./double_precision_simple_matmult_d -x 1000 -y 1000 -i 100 -p -t
Width Height Iterations CPU Total Time GPU Total Time
1000 1000 100 1139.733000 12.64600
Gflops Speedup
14.729125 90.125969
This runs counter to what I expected (less of a speed up) so I have to assume that AMD made some architectural changes (to the GPU) to improve double precision performance.
---jski
SDK 1.2 Catalyst 8.8 Fedora 8_64
HD4870 Q6600 3.419 GHz
./double_precision_simple_matmult_d -x 1000 -y 1000 -i 100 -p -t
Width Height Iterations GPU Total Time Gflops
1000 1000 10 1.184000 15.731800
-p Compare performance with CPU.
Width Height Iterations CPU Total Time GPU Total Time Gflops Speedup
1000 1000 10 97.115000 1.184000 15.731800 82.022804
Again only did 10 iterations ... approx 15 mins for CPU performance.
The GPU figures were approx the same for 100 iterations.
So I guess your tests are indicative of the performance you could expect from a 4870.
Well I read last night that AMD incorporated full IEEE 754 floating point support into the RV770 (Radeon 4870 & 4850). ---jski
I am trying to find a test that will generate over a teraflop ...
HD4870
Q6600 @ 3.419 Ghz
./simple_matmult_d -x 1000 -y 1000 -i 10 -t -p
Width Height Iterations GPU Total Time Gflops
1000 1000 10 1.027000 18.136759
-p Compare performance with CPU.
Width Height Iterations CPU Total Time GPU Total Time Gflops Speedup
1000 1000 10 71.004000 1.027000 18.136759 69.137293
I couldnt wait 15 minutes for the CPU test !