Hello all,

I am new to OpenCL development and I am currently doing some benchmark tests using OpenCL on an AMD Radeon HD 7870.

The code I have written in JOCL (the Java bindings for OpenCL) simply adds two 2D arrays (z= x + y) but it does so many times (z=x+y+y+y+y+y+y...).

The size of the two arrays is 500 by 501 and I am looping over the number of iterations I want to add them together on the GPU. So first I add them once, then ten times, then one thousand times, etc.

The maximum number of iterations that I loop to is 100,000,000. Below is what the log file looks like when I run my code (counter is the number of times my program executes in 5 seconds):

Number of Iterations: 1

Counter: 87

FLOPS Rate: 0.0043310947 GFLOPs/s

Number of Iterations: 10

Counter: 88

FLOPS Rate: 0.043691948 GFLOPs/s

Number of Iterations: 100

Counter: 84

FLOPS Rate: 0.41841218 GFLOPs/s

Number of Iterations: 1000

Counter: 71

FLOPS Rate: 3.5104263 GFLOPs/s

Number of Iterations: 10000

Counter: 8

FLOPS Rate: 3.8689642 GFLOPs/s

Number of Iterations: 100000

Counter: 62

FLOPS Rate: 309.70895 GFLOPs/s

Number of Iterations: 1000000

Counter: 17

FLOPS Rate: 832.0814 GFLOPs/s

Number of Iterations: 10000000

Counter: 2

FLOPS Rate: 974.4635 GFLOPs/s

Number of Iterations: 100000000

Counter: 1

FLOPS Rate: 893.7945 GFLOPs/s

Do these numbers make sense? I feel that 0.97 TeraFLOPS is quite high and that I must be calculating the number of FLOPs incorrectly.

Just for reference, I am calculating the FLOPS in the following way:

FLOPS = counter*(500)*(501)*(iterations)/(time_elapsed)

Any help with this issue will be greatly appreciated.

Thank you

maximum theoretical peak FLOPS of radeon 7870 is 2.5TFLOPS. but is for MAD instruction which is x*y+z so it count as two operations. so you can achieve only ~1.2TFLOPS with simple ADD. so with your 0.9TFLOPS you are pretty close to theoretical maximum.