5 Replies Latest reply on Apr 24, 2015 12:52 PM by gomerofdoom

    Problems with ACML on Arch Linux

    gomerofdoom

      Hello all,

       

      I'm new to the forums, and relatively new to OpenCL and ACML. I'm really hoping someone can help me, as I'm quite stumped.

       

      Several months ago, I had a (relatively simple) C++ code working with a few ACML routines (LAPACK routines dgtrf and dgtri, and BLAS routine dgemm).

       

      Unfortunately, I can't remember what exact version of ACML I was using at that point, but pretty sure it was at least 6.0. I was also using Arch Linux with the catalyst drivers (again, can't remember what version).

       

      Fast-forward to today. I just finished a clean install of Arch Linux, and everything seemed to be working, but when I try to run my code, it hangs on the call to dgtri.

       

      I've done some digging around, and have also found that running "make" in the example directories of ACML (either the gfortran64 or gfortran64_mp) causes a hang, too. I.e. when I run "make" in the example directory of gfortran64, I get the following output:

       

      Compiling program acmlinfo.f:

      gfortran -c  acmlinfo.f -o acmlinfo.o

      Linking program acmlinfo.exe:

      gfortran  acmlinfo.o  -L ../lib -lrt -ldl -lstdc++ -lacml -o acmlinfo.exe

      Running program acmlinfo.exe:

      (export LD_LIBRARY_PATH='../lib:/home/gomer/local/acml/gfortran64/lib'; ./acmlinfo.exe > acmlinfo.res 2>&1)

      ACML (AMD Core Math Library) version 6.1.0.31  (Mon Nov 17 14:47:24 CST 2014)

      Copyright AMD,NAG 2014

      Build system: Linux 3.0.76-0.11-default x86_64 acml-build-lin2

      Built using Fortran compiler: GNU Fortran (GCC) 4.7.1

         with flags:  -ffixed-line-length-132 -Wall -W -Wno-unused -Wno-unused-dummy-argument -Wno-conversion -Wno-uninitialized -fPIC -fno-second-underscore -fimplicit-none -DUSE_ACMLMALLOCFAST -m64 -DIS_64BIT -msse2 -O3

      and C compiler: gcc (GCC) 4.7.1

         with flags:  -W -Wno-unused-parameter -Wstrict-prototypes -Wwrite-strings -D_GNU_SOURCE -D_ISOC99_SOURCE -fPIC  -DUSE_ACMLMALLOCFAST -m64 -DIS_64BIT -mstackrealign -msse2 -O3

       

      Compiling program sgetrf_example.f:

      gfortran -c  sgetrf_example.f -o sgetrf_example.o

      Linking program sgetrf_example.exe:

      gfortran  sgetrf_example.o  -L ../lib -lrt -ldl -lstdc++ -lacml -o sgetrf_example.exe

      Running program sgetrf_example.exe:

      (export LD_LIBRARY_PATH='../lib:/home/gomer/local/acml/gfortran64/lib'; ./sgetrf_example.exe > sgetrf_example.res 2>&1)

       

      At that point, the program seems to get stuck doing nothing.

       

      This is the same sort of behavior I see with my own code. The interesting thing is that in my own code, the call to dgtrf successfully completes, but the subsequent call to dgtri is where the code hangs.

       

      The code compiles fine, and I'm not seeing any errors... it just seems to get stuck.

       

      The last thing that may be worth noting: if I boot up to the command line (no X), and try to run the program, I get "Error. No root privileges. Check with your system admin." But, I can't seem to figure out WHERE that error is coming from.

       

      I'm not really sure where to start trying to fix this. Any and all help is greatly appreciated... trying to get my code back up and running for my dissertation research!!

       

      Thanks,

       

      Paul

        • Re: Problems with ACML on Arch Linux
          gomerofdoom

          Hello all,


          I realized that my post above is a little vague. I've collected some more info, following the example of user "vigo" in another post... hoping it's helpful.

           

          $ uname -a

           

          Linux prometheus 3.19.3-3-ARCH #1 SMP PREEMPT Wed Apr 8 14:10:00 CEST 2015 x86_64 GNU/Linux

           

           

          $ mpicxx -v

           

          Using built-in specs.

          COLLECT_GCC=/usr/bin/g++

          COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper

          Target: x86_64-unknown-linux-gnu

          Configured with: /build/gcc/src/gcc-4.9-20150304/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --disable-multilib --disable-werror --enable-checking=release

          Thread model: posix

          gcc version 4.9.2 20150304 (prerelease) (GCC)

           

          $ util/cpuid.exe


          Chip manufacturer: AuthenticAMD

          AuthenticAMD family 15 extended family 6 model 2

          Model Name: AMD FX(tm)-8320 Eight-Core Processor          

          Chip supports SSE

          Chip supports SSE2

          Chip supports SSE3

          Chip supports AVX

          Chip supports FMA3

          Chip supports FMA4

           

           

          $ gdb sgetrf_example.exe

           

          GNU gdb (GDB) 7.9

          Copyright (C) 2015 Free Software Foundation, Inc.

          License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

          This is free software: you are free to change and redistribute it.

          There is NO WARRANTY, to the extent permitted by law.  Type "show copying"

          and "show warranty" for details.

          This GDB was configured as "x86_64-unknown-linux-gnu".

          Type "show configuration" for configuration details.

          For bug reporting instructions, please see:

          <http://www.gnu.org/software/gdb/bugs/>.

          Find the GDB manual and other documentation resources online at:

          <http://www.gnu.org/software/gdb/documentation/>.

          For help, type "help".

          Type "apropos word" to search for commands related to "word"...

          Reading symbols from sgetrf_example.exe...done.

          (gdb) run

          Starting program: /home/gomer/local/acml/gfortran64/examples/sgetrf_example.exe

          [Thread debugging using libthread_db enabled]

          Using host libthread_db library "/usr/lib/libthread_db.so.1".

          ACML example: solution of linear equations using SGETRF/SGETRS

          --------------------------------------------------------------

           

          Matrix A:

               1.800E+00    2.880E+00    2.050E+00   -8.900E-01

               5.250E+00   -2.950E+00   -9.500E-01   -3.800E+00

               1.580E+00   -2.690E+00   -2.900E+00   -1.040E+00

              -1.110E+00   -6.600E-01   -5.900E-01    8.000E-01

           

          Right-hand-side matrix B:

               9.520E+00    1.847E+01

               2.435E+01    2.250E+00

               7.700E-01   -1.328E+01

              -6.220E+00   -6.210E+00

           

           

          Again... the code seems to get to that point and then just stops. No error message, no activity. If I open another terminal and run "top," I see:

           

          PID     USER    PR   NI VIRT        RES    %CPU %MEM  TIME+ S     COMMAND

          2105 gomer 20   0  238.9m  61.5m   0.0  0.2   0:00.21 S          `- sgetrf_example.

           

          I am going to d/l and try to use ACML 5.3. I'll see if that works.

           

          Any help is greatly appreciated. Thanks!

           

          -Paul

          • Re: Problems with ACML on Arch Linux
            gomerofdoom

            Hello again,

             

            One more addition: acml5.3 seems to work just fine on the same system. I just downloaded it and ran the examples... everything ran and passed.

             

            I then re-compiled my code, linking with the 5.3 libraries, and it seems to be working.

             

            So, for now, I will use ACML5.3, but am still a bit stumped and frustrated that ACML6.1 is not working. Again, any help is greatly appreciated.

             

            Thanks,

             

            Paul

              • Re: Problems with ACML on Arch Linux

                As a heads up, there was a major change in ACL between v5 and v6. You can see a high-level description here: http://developer.amd.com/community/blog/2014/06/19/acml-6-ga/

                 

                In effect, in v6 the team introduced support for heterogeneous architectures. I know this doesn't help solve the problem, but does give you a little context.

                  • Re: Problems with ACML on Arch Linux
                    gomerofdoom

                    Thanks for responding.

                     

                    Yes, I was aware of this change to the ACML, which is why I was trying to use version six. For now, version 5.3 will get the job done, but I was hoping to take advantage of the HSA capability of version six at some point down the road.

                     

                    I will be on my office machine later today, which still has the earlier version of ACML 6 on it (ACML 6.05, I think?). I will send myself a tarball of that version to test on my home machine. If it works, then there is some change between that version and the latest version that is screwy. If it doesn't, then some change I've made to my home machine is the culprit.

                     

                    I'll post back with results.

                     

                    Thanks again,

                     

                    Paul

                      • Re: Problems with ACML on Arch Linux
                        gomerofdoom

                        Well, the plot thickens. I grabbed acml version 6.0.6.13 from my other machine, and it works totally fine, but 6.1 is still hanging.

                         

                        So, to summarize, on this machine:

                         

                        acml5.3/gfortran64/examples -> make works fine - all tests passed

                        acml6.0.6/gfortran64/examples -> make works fine - all tests passed

                        acml6.1/gfortran64/examples -> make hangs on sgetrf_example

                         

                        Any ideas?

                         

                        -Paul