SDK Sample MatrixMultiplication - bad performance on CPU?

Discussion created by c.conti on Jul 12, 2010
I was looking at the performance of matrix multiplication with Stream OpenCL in terms of GFLOPS and I was surprised to see that the results are underwhelming: I don't even reach 10GFLOPS on a machine that has a theoretical peak of 85.12GFLOPS for single precision...


here's the plot of the matrix multiplication results I got for the Stream SDK sample matrix multiplication and gotoBLAS sgemm:



can someone explain me why I get these really bad performance results?