3 Replies Latest reply on Jun 19, 2012 2:51 AM by albert.solernou

    wrong placement of instances

    albert.solernou

      Hi,

      I am benchmarking some hybrid MPI-OpenMP code we developed on several platforms using several compilers. When using opencc (4.5.1-1 AMD patched, both compiled from source and pre-compiled) I see that threads lump into the same core, which obviously leads to a poor performance. This happens on a 2 socket machine using CPU Opteron 6128 (so 16 processors) and OpenMPI (versions 1.4.4 and 1.6) running an updated Ubuntu server.

       

      However, this issue does not happen when using any other compiler. Explicitly, I tested GNU's gcc 4.6, Intel's icc 12.0, and even the community developed Open64's opencc 5.0.

       

      Find attached a sample code that fails on placing correctly the instances, as well as two snapshots of htop that show the placement of the instances (wrk), when running this code using two threads and two processes.

       

      I was recommended by AMD guys to use your compiler for best results, and we'd obviously love to publish best results for it.

      Do you have any advice?

        • Re: wrong placement of instances
          santosh.zanjurne

          i have reproduced the problem, will get back on  this soon.

           

          Regards,

          Santosh

            • Re: wrong placement of instances
              santosh.zanjurne

              Open64 uses the environment variable O64_OMP_SET_AFFINITY to map openmp threads to the CPUs.  By default, the environment variable is set to TRUE and the compiler uses the 'ordered core list' (from /proc/cpuinfo) to place openmp threads one after the other sequentially.  The placement of openmp threads can be altered with another environment variable O64_OMP_AFFINITY_MAP. The corresponding environment variable in gcc is GOMP_CPU_AFFINITY, which is false by default (when set its true and when not set its false) and the host system takes care of the placement of threads on to the  CPU’s.

               

              To get the similar behavior as  GCC, the environment variable O64_OMP_SET_AFFINITY need to be set to FALSE.

               

              Let me know if this helps you.

               

              Regards,

              Santosh