I just did some tests on a FirePro W7000 using
clAmdBlas-1.8.291 on Linux (Fedora 17, kernel 3.6.10-2.fc17, x86_64).
Although tuning was not possible, as written in
the performance of sgemm was lower than expected.
According to the sgemm testing available in clMagma
the speed amounted to 900 GFLOPs, although the W7000
is advertised with 2.4 TFLOPs.
Interestingly, the dgemm performance for double precision
was in spec, that is, around 150 GFLOPs.
This has been tested using a Pitcairn card (W7000) using Driver 9.003.3-121120a-151130C
and AMDAPP-SDK v2.8
Can anyone give a hint as to why the single precision
performance is so far behind the peak performance?