The ACML code when run on the 6344 will use FMA4 instructions and can produce different results than when the SSE code that will run on the Phenom. This is because rounding is slightly different for FMA instructions.
It is possible to have ACML use SSE instructions on the 6344, "export ACML_FMA=0" will do this.
The problem is that SSE can only acheive half the floating point performance of the 6344 PileDriver floating point unit. It requires FMA instructions to access the full performance.
As noted in the other post, there does seem to be a problem with alignment that causes incorrect results, and we'll work on finding a solution to that issue.
I have confirmed that running our tool with ACML 5.3.1 and with ACML_FMA set to 0 produce the same results on a "AMD Phenom(tm) II X6 1090T Processor", "AMD A4-3300 APU with Radeon(tm) HD Graphics" and a "AMD Opteron(tm) Processor 6344". However, I am still seeing differences because of dgemm (although a lot fewer) compared to an older "Intel Xeon E5410". I am also getting a third set of answers with a " Dual-Core AMD Opteron(tm) Processor 8220 SE" and "Dual Core AMD Opteron(tm) Processor 875". I have not confirmed yet that these last differences is because of dgemm or something else.
Our problem is we have all these above mentioned machines in our server farm and when we run our regression tests the jobs will go to most lightly loaded machines. Getting different answers can cause tests to fail. The differences I am seeing are small differences and look like they could be rounding differences but these differences can get multiplied and cause some threshold in our tool to trip differently and take a different path.
We have been running with ACML version 3.0.0 since 2005. This version gives the same answers on all the machines. For many reasons we would like to update to the latest ACML. But getting different answers on different machines is a problem for our testing and potentially for our customers. Do you see a way around this? Will fixing the memory alignment issue mentioned in my other post resolve these differences?