0 Replies Latest reply on Jul 28, 2014 10:25 AM by fedebr89

    Acml v.5.3.1 and v.6 dgemm segmentation fault

    fedebr89

      Hello,

      I found a strange problem using these two versions on  my development system (linux system with intel i7 4700hq (haswell))

      Segmentation fault occurred executing some examples programs based on LifeV (LifeV Project). This simulate fluid structure interaction and mass transport using BLAS and LAPACK functions.

      I found, using gdb and back tracing, that is always called the dgemm function.

      Then I found that the main examples given with acml (gfortran64/examples) work but when I tried to execute the examples under the subdir performance (time_dgemm), I got:

      for acml 6

      Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

      for acml 5.3.1

      Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

       

      So I tried to execute the time_dgemm compiled by my development machine on a machine with an old intel (Core 2 Duo CPU T5750) and it worked!

      Then I also tried the same executable on an Intel Xeon E5-4620 and on an AMD Opteron 6272 and in both cases worked.

      I discovered that the cpus on which it worked don't have the fma3 instruction set.

      So I think that there's a problem on fma3 support on these two versions of acml. Indeed using acml 4.4.0 there are no problems even on my dev machine.

      Some backtrace:

      time_dgemm.exe on acml6

      #0  0x00007ffff62f3985 in dmmavxblka_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #1  0x00007fffffffde2c in ?? ()

      #2  0x0000000000000001 in ?? ()

      #3  0x00007fffffffde30 in ?? ()

      #4  0x0000000000000064 in ?? ()

      #5  0xffffde2c00000064 in ?? ()

      #6  0xffffffffffffff9b in ?? ()

      #7  0xffffffffffffffff in ?? ()

      #8  0x000000000062e0e0 in ?? ()

      #9  0x0000006400000000 in ?? ()

      #10 0x00007fffffffde40 in ?? ()

      #11 0x0000000000000001 in ?? ()

      #12 0xffffffffffffff9b in ?? ()

      #13 0x0000000000000064 in ?? ()

      #14 0xffffffffffffff9b in ?? ()

      #15 0x00000000ffffde2a in ?? ()

      #16 0x00007ffff7d13420 in _GLOBAL_OFFSET_TABLE_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #17 0x00000000ffffde2c in ?? ()

      #18 0x00007fffffffde34 in ?? ()

      #19 0x0000000000008000 in ?? ()

      #20 0x0000000000d590dd in ?? ()

      #21 0x00007ffffff36000 in ?? ()

      #22 0x00007ffff7de9557 in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>)

          at ../elf/dl-runtime.c:111

      #23 0x00007ffff7df0515 in _dl_runtime_resolve () at ../sysdeps/x86_64/dl-trampoline.S:45

      #24 0x00007ffff6211acb in dgemmwraplfma3_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #25 0x00007ffff61fd730 in dgemmchfma3_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #26 0x00007ffff620ec17 in dgemmompfma3_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #27 0x00007ffff620c09b in dgemmp_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #28 0x00007ffff620cafb in dgemm_ ()

         from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #29 0x0000000000401477 in dotime_ ()

      #30 0x0000000000401b92 in MAIN__ ()

       

      time_dgemm.exe on acml5.3.1

      Backtrace for this error:

      #0  0x7F3C4B3B77D7

      #1  0x7F3C4B3B7DDE

      #2  0x7F3C4AD08FEF

      #3  0x420C31 in dgemmwraplfma3_

      #4  0x41F7AF in dgemmchfma3_

      #5  0x404526 in dgemmompfma3_

      #6  0x4028AE in dgemmp_

      #7  0x4030DA in dgemm_

      #8  0x401E56 in dotime_

      #9  0x402571 in MAIN__ at time_dgemm.f90:?

       

      lifev examples with acml6

      Program received signal SIGSEGV, Segmentation fault.

      0x0000000000000000 in ?? ()

      (gdb) bt

      #0  0x0000000000000000 in ?? ()

      #1  0x00007ffff5ffeb59 in dgemv_ () from /home/federico/dev-tesi/acml/acml6.0.4-install/gfortran64/lib/libacml.so

      #2  0x0000000000f78669 in umfdi_local_search ()

      #3  0x0000000000f75589 in umfdi_kernel ()

      #4  0x0000000000f74ac4 in umfpack_di_numeric ()

      #5  0x0000000000aa8857 in Amesos_Umfpack::PerformNumericFactorization (this=0x20d4770)

       

      Any suggestion?

       

      Federico