Benchmark CPU/GPU

Discussion created by fir3ball on Mar 8, 2010
Latest reply on Mar 15, 2010 by fir3ball
N-Body implementations

How does OpenCL "CPU" code compare typically to standard compiled CPU code (with C++/Fortran compilers)?

At first glance, looking at the n-body sample, the CPU/GPU speedup is impressive:

  • OpenCL 1 CPU: 1.4 Gflops
  • OpenCL 2 CPU: 2.8 Gflops
  • OpenCL 4 CPU: 6 Gflops
  • OpenCL GPU: 250 Gflops

Single-CPU, a compiler still seem to have an edge over OpenCL CPU:

  • OpenCL 1 CPU: 1.4 Gflops
  • g++ : 1.5 Gflops
  • intel C : 2.15 Gflops
  • intel fortran: 2.20 Gflops

In 4 CPU:

  • OpenCL 4 CPU: 6.2 Gflops
  • fortran openMP: 9.7 Gflops

My main goal here is to evaluate the viability of the CPU mode for OpenCL code, and so far, it is really worth it to keep a real CPU compilation branch.


Any comment on this? Am I looking at the worst test case for this?

(Yes, I know that in the end, its really algorithm dependent... and that any speedup might not translate well to any problem.  Also, the CPU OpenCL driver is quite new and subject to improvements)