cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jski
Journeyman III

Some initial simple_matmult results using SDK 1.2 + an 4870?

Using (with a Radeon 4870):

./simple_matmult_d -x 1000 -y 1000 -i 100 -t -p 

I got:

Width   Height  Iterations   CPU Total Time  GPU Total Time 

1000    1000      100             876.598000        10.756000  

Gflops           Speedup

17.317266     81.498512

Are these numbers consistent with other's results? (i.e., Is this what I should expect?)

---jski

0 Likes
6 Replies
jski
Journeyman III

Results using (double precision):

./double_precision_simple_matmult_d -x 1000 -y 1000 -i 100 -p -t

Width   Height  Iterations      CPU Total Time  GPU Total Time

1000    1000      100                1139.733000        12.64600

Gflops          Speedup

14.729125     90.125969

This runs counter to what I expected (less of a speed up) so I have to assume that AMD made some architectural changes (to the GPU) to improve double precision performance.

---jski

0 Likes
moodz
Journeyman III

SDK 1.2 Catalyst 8.8 Fedora 8_64

HD4870 Q6600 3.419 GHz

./double_precision_simple_matmult_d -x 1000 -y 1000 -i 100 -p -t

Width   Height  Iterations      GPU Total Time  Gflops         
1000    1000    10              1.184000        15.731800      

-p Compare performance with CPU.
Width   Height  Iterations      CPU Total Time  GPU Total Time  Gflops          Speedup        
1000    1000    10              97.115000       1.184000        15.731800       82.022804 

Again only did 10 iterations ... approx 15 mins for CPU performance.

The GPU figures were approx the same for 100 iterations.

So I guess your tests are indicative of the performance you could expect from a 4870.

0 Likes
jski
Journeyman III

Well I read last night that AMD incorporated full IEEE 754 floating point support into the RV770 (Radeon 4870 & 4850).  ---jski

0 Likes
moodz
Journeyman III

I am trying to find a test that will generate over a teraflop ...

0 Likes
moodz
Journeyman III

HD4870

Q6600 @ 3.419 Ghz

./simple_matmult_d -x 1000 -y 1000 -i 10 -t -p

Width   Height  Iterations      GPU Total Time  Gflops         
1000    1000    10              1.027000        18.136759      

-p Compare performance with CPU.
Width   Height  Iterations      CPU Total Time  GPU Total Time  Gflops          Speedup        
1000    1000    10              71.004000       1.027000        18.136759       69.137293

I couldnt wait 15 minutes for the CPU test !

0 Likes

moods, a kernel with a large amount of arithmetic operations and not many memory accesses will generate this.
0 Likes