AnsweredAssumed Answered

Benchmarking using OpenCL on AMD GPU

Question asked by ke0m on Nov 10, 2013
Latest reply on Nov 23, 2013 by nou

Hello all,


I am new to OpenCL development and I am currently doing some benchmark tests using OpenCL on an AMD Radeon HD 7870.


The code I have written in JOCL (the Java bindings for OpenCL) simply adds two 2D arrays (z= x + y) but it does so many times (z=x+y+y+y+y+y+y...).


The size of the two arrays is 500 by 501 and I am looping over the number of iterations I want to add them together on the GPU. So first I add them once, then ten times, then one thousand times, etc.


The maximum number of iterations that I loop to is 100,000,000. Below is what the log file looks like when I run my code (counter is the number of times my program executes in 5 seconds):


Number of Iterations: 1

Counter: 87

FLOPS Rate: 0.0043310947 GFLOPs/s



Number of Iterations: 10

Counter: 88

FLOPS Rate: 0.043691948 GFLOPs/s



Number of Iterations: 100

Counter: 84

FLOPS Rate: 0.41841218 GFLOPs/s



Number of Iterations: 1000

Counter: 71

FLOPS Rate: 3.5104263 GFLOPs/s



Number of Iterations: 10000

Counter: 8

FLOPS Rate: 3.8689642 GFLOPs/s



Number of Iterations: 100000

Counter: 62

FLOPS Rate: 309.70895 GFLOPs/s



Number of Iterations: 1000000

Counter: 17

FLOPS Rate: 832.0814 GFLOPs/s



Number of Iterations: 10000000

Counter: 2

FLOPS Rate: 974.4635 GFLOPs/s



Number of Iterations: 100000000

Counter: 1

FLOPS Rate: 893.7945 GFLOPs/s


Do these numbers make sense? I feel that 0.97 TeraFLOPS is quite high and that I must be calculating the number of FLOPs incorrectly.


Just for reference, I am calculating the FLOPS in the following way:


FLOPS = counter*(500)*(501)*(iterations)/(time_elapsed)


Any help with this issue will be greatly appreciated.


Thank you