I'm beginning a CPU/GPU research project, and I decided to use OpenCL because of its ability to run code on both platforms. I have a 2.66GHz core 2 Duo running Ubuntu 9.04, and I have a Radeon 4350 GPU. I've installed the OpenCL beta driver and the ati stream sdk (v2, beta4) and have started looking at some benchmarks for the sample code.
Interestingly, of the OpenCL samples from the SDK I've tried (MatrixMultiply, MatrixTranspose, MerseneTwister, and a couple of others), the code mostly runs significantly faster on the CPU than on the GPU. The first part of the attached snippet shows the results from multiplying two 2048x2048 matrices together, and the CPU beats the GPU by a factor of 1.16. The difference for MerseneTwister is over 8x better for the CPU.
So, I'm curious: (1) is this an expected result? (2) should I be re-writing the code to better take into account the GPU's architecture?
As the title of the post says, I'm looking for a sanity check here, to make sure I'm not doing something screwy with the mini-benchmarks. Thanks!
$ ./MatrixMultiplication --device gpu -x 2048 -y 2048 -z 2048 -t -q MatrixA MatrixB Time(sec) 2048x2048 2048x2048 95.0832 $ ./MatrixMultiplication --device cpu -x 2048 -y 2048 -z 2048 -t -q MatrixA MatrixB Time(sec) 2048x2048 2048x2048 81.8752 $ ./MersenneTwister -q -t --device cpu -x 1000000 Generated Numbers Time(sec) Numbers/sec 2000000 0.506 3.95257e+06 $ ./MersenneTwister -q -t --device gpu -x 1000000 Generated Numbers Time(sec) Numbers/sec 2000000 4.133 483910