c.conti

SDK Sample MatrixMultiplication - bad performance on CPU?

Discussion created by c.conti on Jul 12, 2010
Latest reply on Jul 12, 2010 by cjang

hi,

I was looking at the performance of matrix multiplication with Stream OpenCL in terms of GFLOPS and I was surprised to see that the results are underwhelming: I don't even reach 10GFLOPS on a machine that has a theoretical peak of 85.12GFLOPS for single precision...

 

here's the plot of the matrix multiplication results I got for the Stream SDK sample matrix multiplication and gotoBLAS sgemm:

http://img638.imageshack.us/img638/1225/gflopsvsmatrixorder.png

 

can someone explain me why I get these really bad performance results?

Outcomes