How to scale DGEMM?

Discussion created by svs1 on Sep 23, 2011
Latest reply on Sep 26, 2011 by svs1
On 6-core CPU, DGEMM runs in a single thread despite OMP_NUM_THREADS=6 is set


I am evaluating ACML, version 4.4.

Using PGI OpenMP version of ACML (pgi32_mp\lib\libacml_mp_dll.lib) from a 32-bit application built with an OpenMP unaware compiler (C++ Builder), and setting OMP_NUM_THREADS=6 on a 64-bit Windows 7 computer with AMD Phenom II x6 1090T CPU, DGEMM does not scale to multiple cores. Matrix size is 400*400.

Is it because of the compiler deficiency (no OpenMP support) or because of something else?

Thank you