Different results by running OpenCL on different devices

Hi Everybody,

I am using OpenCL to parallel Computational Fluid Dynamics codes. Using same piece of codes I compiled using visual studio with same settings to create executable on Windows 7 professional.

Then, I ran the executable on two AMD video cards (FirePro V4900 and FirePro W8100) and got slightly different results, which both makes physical sense.

I am wondering the reason for the difference.

A paper gives a clue that manufacturer implementation of (Fused multiply-add) FMA from IEEE 754-2008 maybe an possible. However, by using two video cards from same manufacturer, this shouldn't be a problem.

Any vision in this problem will be highly appreciated.

