2 Replies Latest reply on Sep 26, 2011 3:43 AM by svs1

    How to scale DGEMM?

      On 6-core CPU, DGEMM runs in a single thread despite OMP_NUM_THREADS=6 is set


      I am evaluating ACML, version 4.4.

      Using PGI OpenMP version of ACML (pgi32_mp\lib\libacml_mp_dll.lib) from a 32-bit application built with an OpenMP unaware compiler (C++ Builder), and setting OMP_NUM_THREADS=6 on a 64-bit Windows 7 computer with AMD Phenom II x6 1090T CPU, DGEMM does not scale to multiple cores. Matrix size is 400*400.

      Is it because of the compiler deficiency (no OpenMP support) or because of something else?

      Thank you

        • How to scale DGEMM?

          The primary issue here is that you have to use a OpenMP aware compiler/linker in order to use OpenMP features.  Our library will make calls to the PGI openmp runtimes, and expects openmp startup code to be in place.  This is all enabled by the compiler for the main program and the linker.  I don't expect that C++ builder can meet these criteria.

          Look at the makefiles in the examples folder.  These will show the commands (along with the required runtime libraries) necessary to build a working OpenMP executable.

            • How to scale DGEMM?

              Thank you for confirming the reason is lack of OpenMP support in our development tools.

              For comparison, Intel MKL does not have the same limitation. It also uses OpenMP, and can be consumed in multi-threaded way by OpenMP unaware applications. Possibly this is an area ACML could improve.