Speeding up kernels processing doubles

Discussion created by karx11erx on May 28, 2009
Need advice

Application: Transformation of 2D geographical coordinates in double format

If I have understood it right, then a thread can always process a max. of 4 floats simultaneously, which would be the equivalent of two doubles. In other words: Each thread is always processed by 4 ALUs (or however you call them here) in conjunction.

My kernels look somewhat like this:

kernel void transform (double xIn<>, double yIn<>, out double xOut<>, out double yOut<>)
xOut = >some function on xIn<;
yOut = >some function on yIn<;

Now if I would use double2 instead of double, would that theoretically (at least for simple kernels) double the kernel throughput as it would keep all thread ALUs busy?