clAmdBlas/sgemm far from peak performance on FirePro W7000?

Question asked by buqchucker on Jan 4, 2013
Hello everybody,


I just did some tests on a FirePro W7000 using

clAmdBlas-1.8.291 on Linux (Fedora 17, kernel 3.6.10-2.fc17, x86_64).


Although tuning was not possible, as written in

the performance of sgemm was lower than expected.


According to the sgemm testing available in clMagma

the speed amounted to 900 GFLOPs, although the W7000

is advertised with 2.4 TFLOPs.


Interestingly, the dgemm performance for double precision

was in spec, that is, around 150 GFLOPs.


This has been tested using a Pitcairn card (W7000) using Driver 9.003.3-121120a-151130C

and AMDAPP-SDK v2.8



Can anyone give a hint as to why the single precision

performance is so far behind the peak performance?