Hi community, I'm writing this because I have a strange behavior with acmlsetnumthreads function...
I set the max number of threads using the following statement:
And then when I execute for example a DGEMM or a DGETF2 call, I can see through the 'top' command that only 4 processors are being used...
When the call is a DTRSM function, I see that mostly all of my 32 processors are working... That makes me think that I'm probably having more than 4 threads in this function...
I'm trying to make a Thread Pool with nested parallelism and affinity but I have a very low execution speed when two or more xTRSM are executing at the same time.
Can somebody give me some light on this?
I'm using ACML with GCC. openSuse... If there is another information that could be helpful, please tell me
Thanks for reading!