0 Replies Latest reply on May 28, 2009 5:58 PM by karx11erx

    Speeding up kernels processing doubles

      Need advice

      Application: Transformation of 2D geographical coordinates in double format

      If I have understood it right, then a thread can always process a max. of 4 floats simultaneously, which would be the equivalent of two doubles. In other words: Each thread is always processed by 4 ALUs (or however you call them here) in conjunction.

      My kernels look somewhat like this:

      kernel void transform (double xIn<>, double yIn<>, out double xOut<>, out double yOut<>)
      xOut = >some function on xIn<;
      yOut = >some function on yIn<;

      Now if I would use double2 instead of double, would that theoretically (at least for simple kernels) double the kernel throughput as it would keep all thread ALUs busy?