7 Replies Latest reply on Sep 22, 2014 7:32 AM by dipak

    Strange fluctuating completion time on APU GPU



      I'm working on a framework for classification and scheduling of computations on heterogeneous multi-device platforms.

      I recently added a simple computation to the training samples set which performs the sum-of-cols of a matrix (output[i] = Sum(input[r, i]) forall r).
      The kernel code is the following (it looks a bit strange cause it's generated from and F# function):


      kernel void SumCols(global float*matA, global float*c, int matA_length_0, int matA_length_1, int c_length_0) {
         int r = get_global_id(0);
         float accum = 0;
         for(int i = 0; i <= (matA_length_1) - (1);i++) {
            accum = (accum) + (matA[((r) * (matA_length_0)) + (i)]);
         c[r] = accum;

      I'm getting a weird fluctuating completion time by varying the input matrix size from 64x64 to 2048x2048 (element type is float32) with a step of 64.
      The integrated GPU is a 7660D in the A10-5800K APU.


      The following graph shows the completion time by varying the input size. A CSV with numbers is available here: featureBasedScheduling/Sum By Cols-Table 1.csv at master · morellid/featureBasedScheduling · GitHub.
      Any hint about what may cause this strange behaviour?
      SumByCols IGPU.jpg