I'm collecting counters using GPUPerfAPI 2.9 and vector addition (Saxpy) on the A8 integrated GPU.
The kernel is very simple: y[i] = a * x[i] + y [i].
When the input size increases starting from 64KB to 4MB, the number of wavefronts increases in its turn. Anyway, when the input size is 8MB the number of wavefronts drops.
So I guess if this means that wavefronts are reused.
In addition, also other GPU counters drop in the middle of the input size range, as you can see in the file reported below.
Can you help explain this behavior?
Thank you very much!
Saxpy counters: http://www.gabrielecocco.it/Saxpy.htm