I have recently upgraded to ACML 188.8.131.52 (previously was using 5.3.0) and have run into a few problems. First I noticed no documentation on how to properly install ACML6, and assumed the contents of the downloaded tar file just go into what would be the install prefix. I checked the documentation under the file acml-184.108.40.206-gfortran64/Doc/html/BestLibrary.html and found that it mentions install paths that do not appear in the tar file, in particular the "fma4" suffixed items. Is the documentation stating those directories are the ideal location for the files to be installed, or is it remnants from version 5 that supplied the "_fma4" directories?
The issues I've had testing ACML6 performance have been with HPL 2.1 compiled using gcc-4.8.2 and either OpenMPI-1.8.2 or MVAPICH2-2.0. What I've found is that the desired number of threads are spawned but only one core is being used. I'm curious what can be done to debug this, or what information I can provide to find the cause of this issue. Compiling HPL with something like OpenBLAS in the same way, just modifying the LAinc and LAdir options, does not have the same issue of binding to a single core.
HPL Makefile for ACML:
MPdir = $(MPIHOME)
MPinc = -I$(MPdir)/include
MPlib = -L$(MPdir)/lib64 -lmpi
LAdir = $(ACML_MP_ROOT)
LAinc = -I$(LAdir)/include
LAlib = -L$(LAdir)/lib -lacml_mp
CC = mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -funroll-loops -O3 -mfma4 -W -Wall -lpthread
LINKER = $(CC)
LINKFLAGS = $(CCFLAGS)
ACML_MP_ROOT = /apps/gcc-4.8.2/acml-gfortran64/220.127.116.11/gfortran64_mp