15 Replies Latest reply on Aug 23, 2012 4:08 AM by santosh.zanjurne

    Problems building openmpi 1.6 with AMD open64 compiler

    mithion

      With the new release of AMD open64 4.5.2, I wanted to recompile openmpi to support the new version. However I've been having problems building openmpi. I've tried different things. However, when I run the following configure command:

       

      # ./configure --prefix=/usr/local/openmpi CC=opencc CXX=openCC F77=openf90 FC=openf90 CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64

       

      Results in the following error:

       

      checking Fortran 90 kind of MPI_INTERGER_KIND (selected_int_kind(9))... ./configure: line 53651: 16695 Illegal instruction     ./conftest 1>&5 2>&1

      configure: error: Could not determine kind of selected_int_kind(MPI_INTEGER_KIND)

       

      I've also tried omitting specifying the F77 variable in the configure line. This allows the configuration to proceed error free and I can subsequently build and install openmpi. However doing this results in an error message saying that openmpi was not built with Fortran 90 support so the command mpif90 does not work. Anybody else seeing these kinds of errors?

       

      Philippe

        • Re: Problems building openmpi 1.6 with AMD open64 compiler
          santosh.zanjurne

          Hi Philppe,

          Can you give me the details on the machine-processor/os-name-version/gcc/glibc/binutil version.  Also let me know how you are using the Open64 compiler, i.e. how you are building the sources or which binary package you downloaded.

           

          I tried to reproduce this on RHEL-6.2 and SLES11.sp2, but I could not reproduce this issue.

           

          Regards,

          Santosh

            • Re: Problems building openmpi 1.6 with AMD open64 compiler
              mithion

              We are using this on a cluster and I'm currently trying to build open64 support with openmpi 1.6 (the latest version at this time) on the headnode of the cluster. Our head node is sporting a Intel Core 2 Quad Q8400. However, the compute nodes are running Opterons 6272 hence why we wish to use open64. I've downloaded openmpi 1.6 and extracted the compressed tar file. I then CD into the directory and run the ./compile command I posted above.

               

              I've tried with different variations of options (ie I tried with FC only, FC + F77, FC + CC + CXX, FC + F77 + CXX + CC etc...). However, the openmpi FAQ recommends building openmpi with a consistent compiler suite for best results. If I understand correctly, GCC is used where a specific compiler isn't explicitly specified. I only get the above error (the one about selected_int_kind) when I simultaneously specify the FC and F77 compilers. If I omit F77, the ./configure completes without error but I get another problem down the road.

               

              So to give you some information about our setup:

              We are running Rocks Cluster Linux 6.0 (which is built from CentOS 6.2)

              # rpm -q gcc glibc binutils

              gcc-4.4.6-3.el6.x86_64

              glibc-2.12-1.47.el6_2.9.x86_64

              glibc-2.12-1.47.el6_2.9.i686

              binutils-2.20.51.0.2-5.28.el6.x86_64

               

              I also wanted to add that we've been successfully using open64 4.5.1 on this system for a few months with the same version of openmpi.

              • Re: Problems building openmpi 1.6 with AMD open64 compiler
                mithion

                So I went ahead and switched back my environment to use open64 4.5.1 and I was able to successfully run the ./compile command from the original post. So something changed with 4.5.2.

                  • Re: Problems building openmpi 1.6 with AMD open64 compiler
                    santosh.zanjurne

                    Since you are running the application on Opteron machine, you should set the all flags, C/FC/F77/CXX, FLAGS to "-march=bdver1".  Doing this helps compiler generate optimized code for target architecture and would help you get the best performance.  I shall try to reproduce the issue you reported though.  Let me know if this helps.

                     

                    Regards,

                    Santosh

                      • Re: Problems building openmpi 1.6 with AMD open64 compiler
                        santosh.zanjurne

                        Phillipe,

                        Can you send the output of the attached program from the console as well as the files generated by the compiler ?    Test program and the command in the file attached. 

                          • Re: Problems building openmpi 1.6 with AMD open64 compiler
                            mithion

                            So here is the requested information. I've also included in the file output.txt the output of the compilation and test program. It appears in results in an illegal operation.

                              • Re: Problems building openmpi 1.6 with AMD open64 compiler
                                craas

                                I can reproduce this issue (OpenMPI build failure and testprogram illegal instruction issue).

                                 

                                $ strace ./a.out >strace.out 2>&1

                                Illegal instruction

                                 

                                strace.out is attached

                                 

                                $ uname -a

                                Linux x 2.6.32-279.1.1.el6.x86_64 #1 SMP Tue Jul 10 11:24:23 CDT 2012 x86_64 x86_64 x86_64 GNU/Linux

                                 

                                $ cat /etc/redhat-release

                                Scientific Linux release 6.2 (Carbon)

                                 

                                # rpm -q gcc glibc binutils

                                gcc-4.4.6-3.el6.x86_64

                                glibc-2.12-1.80.el6_3.3.x86_64

                                glibc-2.12-1.80.el6_3.3.i686

                                binutils-2.20.51.0.2-5.28.el6.x86_64 

                                  • Re: Problems building openmpi 1.6 with AMD open64 compiler
                                    santosh.zanjurne

                                    I think both of you are facing this problem because you have downloaded compiler binaries which are meant to run on Bulldozer machine. i.e. Pakcage listed in "SLES 11, RHEL 6".

                                    Can you please confirm?

                                     

                                    http://developer.amd.com/tools/open64/pages/default.aspx#four

                                     

                                    On the above link we have two different compiler binaries to download, with rpm and tar version for each. 

                                     

                                    A. SLES 11, RHEL 6 -

                                       Since SLES-11 and RHEL-6 by default come with latest binutil package which has a support for Bulldozer architecture,  these binaries are build ON the Bulldozer machine with bdver1 flag.  So compiler binaries/libraries use Bulldozer instructions inside and these will not run on non-bulldozer machine.

                                     

                                    B. SLES 10 SP2, SLES 10 SP3, RHEL 5.5 -

                                       With old binutil, without Bulldozer instructions support, binaries in this category should run on Bulldozer as well as non-bulldozer machine.  Since Bulldozer instrunctions are not used in the compiler binaries.

                                     

                                    On non-Bulldozer machine one should use binaries listed under 'B' above.

                                     

                                    Let me know if this helps.

                                     

                                    Regards,

                                    Santosh

                                      • Re: Problems building openmpi 1.6 with AMD open64 compiler
                                        craas

                                        Thanks, Santosh!

                                         

                                        Confirmed: We have

                                        • RHEL 6
                                        • binutils > 2.20.0-0.7.9, and
                                        • AMD Opteron Family 16 (instead of AMD Opteron Family 15h)

                                        and I chose the RHEL 6 binary package. After switching to the RHEL 5 package (without "intrinsic" bdver1 optimization) the compiler works again. Sorry for this, I should have taken the "Bulldozer architecture" literally.

                                         

                                        Two proposals:

                                        http://developer.amd.com/tools/open64/pages/default.aspx#four

                                        http://developer.amd.com/tools/open64/assets/ReleaseNotes.txt

                                        a) Maybe one should replace the "x86 Open64 4.5.2-1 Compilers for Linux with older GlibC/assembler" by "x86 Open64 4.5.2-1 Compilers for Linux with older GlibC/assembler or for non-Bulldozer architectures" on the web page?

                                        b) In addition, the binutils remarks are somewhat confusing, as two different versions are referred to. Only the "x86 Open64 4.5.2 Release Notes" give the full list of working binutils+distro combinations. Maybe referring to the Release Notes only is less confusing?

                                         

                                        Full summary:

                                         

                                        $ rpm -q binutils

                                        binutils-2.20.51.0.2-5.28.el6.x86_64

                                         

                                        $ cat /etc/redhat-release

                                        Scientific Linux release 6.2 (Carbon) -> aka community RHEL 6.2

                                         

                                        $ cat /proc/cpuinfo|head -25|egrep '^vendor_id|^cpu family|^model|^flags'|perl -pwe 's/[\t ]+/ /gm'

                                        vendor_id : AuthenticAMD

                                        cpu family : 16

                                        model : 9

                                        model name : AMD Opteron(tm) Processor 6128

                                        flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr npt lbrv svm_lock nrip_save pausefilter

                                         

                                        -> Magny-Cours OS6128WKT8EGO

                                        http://developer.amd.com/Assets/CompilerOptQuickRef-61004100.pdf

                                         

                                        Santosh's test case:

                                        x86_open64-4.5.2-1.x86_64.tar.bz2 for RHEL 6 yields illegal instruction.

                                        x86_open64-4.5.2-1.rhel5_sles10.x86_64.tar.bz2 for RHEL 5 works.

                                         

                                        Thanks again!

                                        • Re: Problems building openmpi 1.6 with AMD open64 compiler
                                          mithion

                                          I still think something is wrong with version 4.5.2. I've been using version 4.5.1 and compiling my software with "-march=bdver1" which means I've been using option A for the last 4 months. Option A should work on non bulldozer architecture. We use an old Core 2 Quad for our frontend for the simple reason that it seemed a waste of resources to use an expensive high end Opteron on the frontend which realistically does very little work. But in the end, I was still able to compile 4.5.1 option A with openmpi 1.6 regardless of the frontend architecture. Is there a reason why version 4.5.2 compiler itself was compiled with bdver1 thus limiting its portability?

                            • Re: Problems building openmpi 1.6 with AMD open64 compiler
                              mithion

                              I was able to fix the problem by installing the compiler from the RPM package instead of the tarball version. So to summarize, the RHEL 6 version of open64 4.5.2 does work on non-bulldozer machines, but I wasn't able to get the tarball to work, only the RPM. Hope this helps others.

                               

                              EDIT: I went and tested the small test program santosh provided and it still results in an illegal operation. But openmpi was compiled correctly and our own internal code compiled and is currently running on the cluster with the new compiler version. There's something still iffy about selected_int_kind...

                                • Re: Problems building openmpi 1.6 with AMD open64 compiler
                                  santosh.zanjurne

                                  If you see that the test program fails with RPM version of the binaries, then openmpi build should also fail at 'configure' stage, since the test program is taken from the 'configure' script of the openmpi sources.

                                   

                                  You can search "checking Fortran 90 kind of MPI_INTEGER_KIND (selected_int_kind(9))"  string in the config.log file, where you build the sources to see if the test program which failed with RPM binaries is tested/passed in 'configure' stage of the openmpi.

                                   

                                  Make sure you have clean sandbox before hand; execute 'make clean distclean'.

                                   

                                  If you still see it passing then please attach your config.log file here.

                                   

                                  To verify that binaries in tar and rpm are same for SLES10/RHEL5 or SLES11/RHEL6 group, you can use opencc from respective folder and pass -v command, to check the build dates to see if they are same.  They must be same for each group.