I've been looking through research papers to make a comparison between a number of architectures including CPUs, the Cell BE and GPUs and I see GFLOPS being used as a unit of measurement but it is never stated exactly how they get their measurements.
Is it as simple as looping the program to execute for roughly a second (or more for greater accuracy I assume) and counting the number of operations (+, -, /, *, etc.) in the kernel?
If so, why do I see numbers such as the average GFLOPS and peak GFLOPS? Continuing, how are these numbers determined?
And, assuming it's not answered by the time you read this, how would I go about measuring GFLOPS for my own CPU and GPU for comparison (using a specific algorithm such as Mersenne Twister for example).
Thanks.
-Matt