Different results on x86, NVidia and ATI
1.0f / 96.0f (expression evaluated at runtime as 1.0f / x, with x = 96.0)
Gives on x86 (westmere) and NVidia:
1.041666697711e-02 (abs error compared to 1.04166..e-02 is ~3.1e-10)
On ATI 6970 HD, OpenCL 1.1, SDK 2.3
1.041666604578e-02 (abs error compared to 1.04166..e-02 is ~6.2e-10)
So the x86 and Nvidia hardware give the proper answer (the one with the lowest error).
What can I do to get full accuracy on the 6970 and OpenCL?