I have been getting good results using acml_mp on an 8 core opteron system, using dsyev on matrices of ~5000*5000, and can observe multiple threads running using htop.
I tried switching to ssyev to see if this would improve the speed of the calculation, but for some reason the method now only uses a single thread and takes roughly twice as long to run. Is this normal? I can understand why using single precision arithmetic on a 64 bit machine might cause issues that prevent it from running faster, but surely it should still at least be able to exploit using multiple threads?
I have downloaded the newest version of acml and double checked that I have linked to the mp version of the library.
(Checked again and to clarify, it does spawn multiple threads, but only one appears to be doing any actual work)
We'll work on duplicating this. Which OS and compiler are you using?
And which ACML version?