I have noticed that when I use reduction kernel with type float, it produces some errors and the result is different than the result from the CPU.
In fact, for some small array of float integers, the reduction kernel produces same results as the reference CPU version, but as the array gets bigger it produces the error when adding numbers, so instead of, for example, 4087041 I get 4087044. It's a mild error but it still is an error, and the error gets bigger when the array of data gets bigger.
I have ATI Radeon 4870, Windows 7 64Bit, Stream SDK 2.0.
Hope this will get fixed in the next version of SDK and the drivers.