I have a program that I need to run many times (thousands) to get a Jacobian, I usually do this on Intel processors, without major problems (negligible performance loss), so I expected to have a similar behavior on this new processor, but it doesn´t. I have a significant performance loss when adding a new process, as an example, 1 process takes 30 minutes (approx), 2 processes take 40 minutes each and thus getting worse and worse, reaching run times of more than 6 hours at a time when running 40 processes simultaneously.
I have tested under windows 10 and windows server 2019 (linux pending), I 've recompiled the code under some optimization for AMD processors, and i have made tests setting affinity to determinated cores (to avoid cores switching while running) but nothing seems to work. On the other hand, I've done the same work on intel processor (i5, i7, Xeon) where there is not a noticeable lack of performance while reaching the same amount of process as cores.
These tests were carried out on 2 different computers, both on 128 Gb ram, NVMe PCIe SSD, dual quadro RTX8000, etc...
Does anyone have any advice or recommendations? I find hard to believe that a 64 cores processor had a worse performance (for this particular case) than a 12 or 18 cores intel processor.