2 Replies Latest reply on Jun 6, 2013 10:07 AM by nickr523

    dgemm giving different results depending on the machine it is run on.


      We are seeing different results from dgemm depending on which machine we run on.  We see these differences when we run with an AMD 6344 processor versus any machine that does not support AVX, FMA3, and FMA4.  I am running with ACML 5.3.1.  I have tried gfortran64 and gfortran64_fma4 library and both libraries produced the same results on a given machine.  I get consistent answer when running on a variety of Intel and AMD machines that do not support AVX, FMA3, and FMA4.  It is only the AMD 6344 that produces different answers.  This is the only processor I have access to that supports AVX, FMA3, and FMA4.


      Is this a bug?  Is there a setting we can use to get the AMD 6344 to produce the same answers as the other machines?





        • Re: dgemm giving different results depending on the machine it is run on.

          The ACML code when run on the 6344 will use FMA4 instructions and can produce different results than when the SSE code that will run on the Phenom.   This is because rounding is slightly different for FMA instructions.

          It is possible to have ACML use SSE instructions on the 6344, "export ACML_FMA=0" will do this.

          The problem is that SSE can only acheive half the floating point performance of the 6344 PileDriver floating point unit.  It requires FMA instructions to access the full performance.

          As noted in the other post, there does seem to be a problem with alignment that causes incorrect results, and we'll work on finding a solution to that issue.

            • Re: dgemm giving different results depending on the machine it is run on.

              I have confirmed that running our tool with ACML 5.3.1 and with ACML_FMA set to 0 produce the same results on a "AMD Phenom(tm) II X6 1090T Processor", "AMD A4-3300 APU with Radeon(tm) HD Graphics" and a "AMD Opteron(tm) Processor 6344".  However, I am still seeing differences because of dgemm (although a lot fewer) compared to an older "Intel Xeon E5410".  I am also getting a third set of answers with a " Dual-Core AMD Opteron(tm) Processor 8220 SE" and "Dual Core AMD Opteron(tm) Processor 875".  I have not confirmed yet that these last differences is because of dgemm or something else.


              Our problem is we have all these above mentioned machines in our server farm and when we run our regression tests the jobs will go to most lightly loaded machines.  Getting different answers can cause tests to fail.  The differences I am seeing are small differences and look like they could be rounding differences but these differences can get multiplied and cause some threshold in our tool to trip differently and take a different path.


              We have been running with ACML version 3.0.0 since 2005.  This version gives the same answers on all the machines.  For many reasons we would like to update to the latest ACML.  But getting different answers on different machines is a problem for our testing and potentially for our customers.  Do you see a way around this?  Will fixing the memory alignment issue mentioned in my other post resolve these differences?