I am new to OpenCL development and I am currently doing some benchmark tests using OpenCL on an AMD Radeon HD 7870.
The code I have written in JOCL (the Java bindings for OpenCL) simply adds two 2D arrays (z= x + y) but it does so many times (z=x+y+y+y+y+y+y...).
The size of the two arrays is 500 by 501 and I am looping over the number of iterations I want to add them together on the GPU. So first I add them once, then ten times, then one thousand times, etc.
The maximum number of iterations that I loop to is 100,000,000. Below is what the log file looks like when I run my code (counter is the number of times my program executes in 5 seconds):
Number of Iterations: 1
FLOPS Rate: 0.0043310947 GFLOPs/s
Number of Iterations: 10
FLOPS Rate: 0.043691948 GFLOPs/s
Number of Iterations: 100
FLOPS Rate: 0.41841218 GFLOPs/s
Number of Iterations: 1000
FLOPS Rate: 3.5104263 GFLOPs/s
Number of Iterations: 10000
FLOPS Rate: 3.8689642 GFLOPs/s
Number of Iterations: 100000
FLOPS Rate: 309.70895 GFLOPs/s
Number of Iterations: 1000000
FLOPS Rate: 832.0814 GFLOPs/s
Number of Iterations: 10000000
FLOPS Rate: 974.4635 GFLOPs/s
Number of Iterations: 100000000
FLOPS Rate: 893.7945 GFLOPs/s
Do these numbers make sense? I feel that 0.97 TeraFLOPS is quite high and that I must be calculating the number of FLOPs incorrectly.
Just for reference, I am calculating the FLOPS in the following way:
FLOPS = counter*(500)*(501)*(iterations)/(time_elapsed)
Any help with this issue will be greatly appreciated.