AnsweredAssumed Answered

Strange fluctuating completion time on APU GPU

Question asked by cadorino on Sep 10, 2014
Latest reply on Sep 22, 2014 by dipak


I'm working on a framework for classification and scheduling of computations on heterogeneous multi-device platforms.

I recently added a simple computation to the training samples set which performs the sum-of-cols of a matrix (output[i] = Sum(input[r, i]) forall r).
The kernel code is the following (it looks a bit strange cause it's generated from and F# function):


kernel void SumCols(global float*matA, global float*c, int matA_length_0, int matA_length_1, int c_length_0) {
   int r = get_global_id(0);
   float accum = 0;
   for(int i = 0; i <= (matA_length_1) - (1);i++) {
      accum = (accum) + (matA[((r) * (matA_length_0)) + (i)]);
   c[r] = accum;

I'm getting a weird fluctuating completion time by varying the input matrix size from 64x64 to 2048x2048 (element type is float32) with a step of 64.
The integrated GPU is a 7660D in the A10-5800K APU.


The following graph shows the completion time by varying the input size. A CSV with numbers is available here: featureBasedScheduling/Sum By Cols-Table 1.csv at master · morellid/featureBasedScheduling · GitHub.
Any hint about what may cause this strange behaviour?
SumByCols IGPU.jpg