3 Replies Latest reply on Jan 2, 2013 3:59 PM by rc556677

    Illegal instruction: ACML 5.3.0+Open64 4.5.2

    rc556677

      This is on Fedora 18 Beta:

       

      Fortran examples are failing witth "Illegal instruction" using Open64 4.5.2 and ACML 5.3.0

      The target CPU is Phenom II X4 965

      All C C++ examples pass. Only the Fortran examples fail

      It looks like an illegal instruction in the Fortran support library.

      I wonder whether this is an Open64 or ACML problem?

       

      The combination of  GCC 4.7.2 and ACML 5.3.0 passes all examples

       

      [178502.799741] traps: sgetrf_example.[8443] trap invalid opcode ip:7f87dac66780 sp:7fffd45fe468 error:0 in libffio.so[7f87dac14000+84000]

       

      $ /opt/acml5.3.0/util/cpuid.exe

      Chip manufacturer: AuthenticAMD

      AuthenticAMD family 15 extended family 1 model 4

      Model Name: AMD Phenom(tm) II X4 965 Processor

      Chip supports SSE

      Chip supports SSE2

      Chip supports SSE3

      Chip does not support AVX

      Chip does not support FMA3

      Chip does not support FMA4

       

      Compiling program acmlinfo.f:

      openf95 -c -O1 acmlinfo.f -o acmlinfo.o

      Linking program acmlinfo.exe:

      openf95  acmlinfo.o  /opt/acml5.3.0/open64_64/lib/libacml.a -lrt -ldl -o acmlinfo.exe

      Running program acmlinfo.exe:

      (export LD_LIBRARY_PATH='/opt/acml5.3.0/open64_64/lib:'; ./acmlinfo.exe > acmlinfo.res 2>&1)

      ACML (AMD Core Math Library) version 5.3.0.67  (Tue Dec 11 04:15:54 CST 2012)

      Copyright AMD,NAG 2012

      Build system: Linux 3.0.13-0.27-default x86_64 acml-build-lin2

      Built using Fortran compiler: openf95 Open64 Compiler Suite: Version 4.5.2

         with flags:  -OPT:vcast_complex=OFF -Wall -fPIC -fno-second-underscore -DUSE_ACMLMALLOCFAST -m64 -DIS_64BIT -march=opteron -msse -msse2 -O2

      and C compiler: gcc (GCC) 4.7.1

         with flags: -L/opt/x86_open64-4.5.2/lib/gcc-lib/x86_64-open64-linux/4.5.2 -Wall -W -Wno-unused-parameter -Wstrict-prototypes -Wwrite-strings -D_GNU_SOURCE -D_ISOC99_SOURCE -fPIC -DUSE_ACMLMALLOCFAST -m64 -DIS_64BIT -march=opteron -msse -msse2 -O3

       

       

      Compiling program sgetrf_example.f:

      openf95 -c -O1 sgetrf_example.f -o sgetrf_example.o

      Linking program sgetrf_example.exe:

      openf95  sgetrf_example.o  /opt/acml5.3.0/open64_64/lib/libacml.a -lrt -ldl -o sgetrf_example.exe

      Running program sgetrf_example.exe:

      (export LD_LIBRARY_PATH='/opt/acml5.3.0/open64_64/lib:'; ./sgetrf_example.exe > sgetrf_example.res 2>&1)

      /bin/sh: line 1:  8443 Illegal instruction     (core dumped) ./sgetrf_example.exe > sgetrf_example.res 2>&1

      make: *** [sgetrf_example.res] Error 132

        • Re: Illegal instruction: ACML 5.3.0+Open64 4.5.2
          chipf

          I can duplicate this issue.  I ran under gdb, and the illegal instruction is a vzeroupper instruction, which is an AVX opcode, not supported by the part you are using. 

          This instruction is at the start of the main program, which means that the open64 compiler is putting it in by default.  I even added -march=opteron -msse -msse (the flags used for the ACML library build) and that did not change the problem..   I then added -mno-avx to the command line and the problem no longer occurs in MAIN, but instead occurs in the ACML library.

           

          Unfortunately we did not build the ACML library with -mno-avx, so vzeroupper occurs in many places.

           

          We'll have to rebuild the open64 version to resolve this problem, I'm not sure when we will be able post it.

            • Re: Illegal instruction: ACML 5.3.0+Open64 4.5.2
              chipf

              My previous post is slightly in error.  The next illegal instruction is in the libfortran.a runtime supplied by the open64 compiler.

              However I was able to solve the problem!

               

              On the open64 compiler page you should find two recent 4.5.2-1 builds.  The first set are for "Piledriver core" devices.

              The second set are for any x86_64 parts. 

              http://developer.amd.com/tools/cpu-development/x86-open64-compiler-suite/

              The file name is: x86_open64-4.5.2-1.rhel5_sles10.x86_64.tar.bz2

               

              I downloaded this second version and installed it, making sure that LD_LIBRARY_PATH points to the runtime libraries from this new version.  I added -mno-avx to the FLAGS definitions in the example GNUmakefile.

              After these changes, the examples all built and ran correctly.

               

              Aparently this alternate build of the open64 compiler has the runtimes built without AVX instructions.

              The good news is there is no need for a new ACML version.

                • Re: Illegal instruction: ACML 5.3.0+Open64 4.5.2
                  rc556677

                  Thank you for investigating this - it has solved my problem.

                   

                  Following your instructions I installed x86_open64-4.5.2-1.rhel5_sles10.x86_64.rpm

                  on Fedora 18/Phenom II X4 965. I checked that the runtimes libfortran.so libacml_mv.so libffio.so

                  are from this build. Now Open64/ACML 5.3.0 examples all build and run correctly.

                   

                  Thanks again.

                   

                  Richard