3 Replies Latest reply on Aug 20, 2009 9:44 AM by dbacklund

    Problem compiling LAPACK timing suite with openf90

    dbacklund

      I am having a minor issue with openf90 and may have uncovered a bug. I am trying to compile and run the LAPACK timing suite from netlib.org. My goal is to time ACML routines vs. those of lapack+gotoblas. The problem is when the example program is created and run, there is a SEGMENTATION FAULT. My steps to reproduce are below. I am using CentOS 5.3 on a Rocks Cluster with gcc version 4.1.2 20071124 (from gcc -v) on dual AMD Barcelona hardware.

      Download both the LAPACK source as well as the timing package and unpack them into a clean directory.

      [code]
      wget http://www.netlib.org/lapack/lapack.tgz
      wget http://www.netlib.org/lapack/timing/timing.tgz

      tar xfz lapack.tgz;cd lapack-3.2.1;tar xfz ../timing.tgz
      [/code]

      Edit the make.inc file to include openf90 as well as acml. Here is what I use to recreate the error.

      [code]
      SHELL = /bin/sh
      PLAT = _LINUX
      #
      FORTRAN  = openf90
      OPTS     = -g -O0
      DRVOPTS  = $(OPTS)
      NOOPT    = -g -O0
      LOADER   = openf90
      LOADOPTS = -Wl,-R/opt/acml4.3.0/open64_64/lib
      #
      # Timer for the SECOND and DSECND routines
      #
      # SECOND and DSECND will use a call to the Fortran standard INTERNAL FUNCTION CPU_TIME
      TIMER    = INT_CPU_TIME
      #
      ARCH     = ar
      ARCHFLAGS= cr
      RANLIB   = ranlib
      #
      BLASLIB      = -L/opt/acml4.3.0/open64_64/lib -lacml
      #
      #  Names of generated libraries.
      #
      LAPACKLIB    = lapack$(PLAT).a
      TMGLIB       = tmglib$(PLAT).a
      EIGSRCLIB    = eigsrc$(PLAT).a
      LINSRCLIB    = linsrc$(PLAT).a
      [/code]

      I don't compile the lapack library since I am using ACML, but I do touch it so Make does not complain.

      Touch the LAPACKLIB and make the TMGLIB.

      [code]
      touch lapack_LINUX.a
      make tmglib
      [/code]

      Now enter the TIMING directory and build executables. The problem I encounter is with the double precision eigenvalue (EIG) timing tests. The single precision runs correctly, but double, complex, and double complex all Seg fault.

      [code]
      cd TIMING/EIG/EIGSRC;make;cd ..;make
      [/code]

      This will generate the proper eigenvalue executables.

      [code]
      [dbacklund@****** TIMING]$ ldd xeigtimd
              libacml.so => /opt/acml4.3.0/open64_64/lib/libacml.so (0x00002ba4e0549000)
              libmv.so.1 => /opt/open64//lib/gcc-lib/x86_64-open64-linux/4.2.2.1/libmv.so.1 (0x00002ba4e2590000)
              libm.so.6 => /lib64/libm.so.6 (0x000000314ea00000)
              libacml_mv.so => /opt/acml4.3.0/open64_64/lib/libacml_mv.so (0x00002ba4e26bd000)
              libc.so.6 => /lib64/libc.so.6 (0x000000314e600000)
              libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000035b2c00000)
              librt.so.1 => /lib64/librt.so.1 (0x000000314fe00000)
              /lib64/ld-linux-x86-64.so.2 (0x000000314e200000)
              libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003160000000)
              libpthread.so.0 => /lib64/libpthread.so.0 (0x000000314f200000)
      [/code]

      [code]
      cd ..
      ./xeigtimd < dseptim.in   <-- This is where I get a Seg Fault.
      [/code]

      The Seg fault occurs regardless of input file. I have run GDB and included the output below.

      [code]
      [dbacklund@******* TIMING]$ gdb xeigtimd             
      GNU gdb Red Hat Linux (6.5-37.el5_2.2rh)
      Copyright (C) 2006 Free Software Foundation, Inc.
      GDB is free software, covered by the GNU General Public License, and you are
      welcome to change it and/or distribute copies of it under certain conditions.
      Type "show copying" to see the conditions.
      There is absolutely no warranty for GDB.  Type "show warranty" for details.
      This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/libthread_db.so.1".

      (gdb) run < dseptim.in
      Starting program: /home/dbacklund/ARCHIVES/TAR/lapack-timing-test/lapack-3.2.1/TIMING/xeigtimd < dseptim.in
      [Thread debugging using libthread_db enabled]
      [New Thread 47510111176096 (LWP 16649)]

      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 47510111176096 (LWP 16649)]
      0x00000000004106ef in MAIN__ ()
          at /home/dbacklund/ARCHIVES/TAR/lapack-timing-test/lapack-3.2.1/TIMING/EIG/dtimee.f:1
      1             PROGRAM DTIMEE
      Current language:  auto; currently fortran

      (gdb) backtrace
      #0  0x00000000004106ef in MAIN__ ()
          at /home/dbacklund/ARCHIVES/TAR/lapack-timing-test/lapack-3.2.1/TIMING/EIG/dtimee.f:1
      #1  0x00000000004da062 in main () at ../../libf/fio/main.c:58
      [/code]

      Line 1 of dtimee.f is

      [code]
            PROGRAM DTIMEE
      [/code]

      This is not a mission critical error, but I did have a similar problem with openf90 and a utility program included in the software that we use for research. That software is closed source so I cannot share it with the public. The backtrace of this utility that segfaults shows that it occurred at line 1 again (PROGRAM programname) line.

      This is not urgent like I said, but maybe this will be useful to some.