while trying to check the performance of my computer's GPU (Pitcairn)
I noticed that even when submitting the calculation to host cpu as default
results in completely wrong results of sgemm & dgemm.
The system consists of an Intel Xeon E5-2670 CPU and an AMD/ATI FirePro W7000 card,
running an Ubuntu 14.04.
The test-setup consists of two 1500 x 1500 matrices, initialized element-wise to a random value between 0 and 1.
The destination matrix was once calculated 'by-hand, and once by a dgemm call.
The maximum absolute element-wise difference between 'by-hand' and dgemm was
using acml-220.127.116.11/gfortran64. The result is clearly wrong.
Just to see that the acml6 did send it to the host cpu (ACML_LOG_FILTER=1):
GemmThreshold: transa( N ), transb( N ), m( 1500 ), n( 1500 ), k( 1500 ), alpha.real( 1 ), alpha.imag( 0 ), lda( 1500 ), ldb( 1500 ), beta.real( 0 ), beta.imag( 0 ), ldc( 1500 ), prec( d ), usegpu( 0 )
Whereas linking the same binary with acml-5.3.1/gfortran64 you get the
roughly expected difference of
For checking and reference, the small test program is attached.
Here the test-case is only in double-precision, but the same
happens with single precision.
Has anybody noticed this before?