I am using OpenCL to parallel Computational Fluid Dynamics codes. Using same piece of codes I compiled using visual studio with same settings to create executable on Windows 7 professional.
Then, I ran the executable on two AMD video cards (FirePro V4900 and FirePro W8100) and got slightly different results, which both makes physical sense.
I am wondering the reason for the difference.
A paper gives a clue that manufacturer implementation of (Fused multiply-add) FMA from IEEE 754-2008 maybe an possible. However, by using two video cards from same manufacturer, this shouldn't be a problem.
Any vision in this problem will be highly appreciated.