Dear Gurus,
Once again, a recurring question on running Gaussian16 on AMD CPUs.
Recently, one of my 3960X's started to experience some issues with gaussian16.rev_B01with avx2 support (precomplied) throwing out "Error: illegal instruction, illegal opcode" and "Error: segmentation violation, address not mapped to object" in random moments. I am trying to diagnose the problem for few weeks and run out of ideas. I am running Ubuntu 20.04 LTS server. Here's some hints:
1. I have been using PGI_FASTMATH_CPU=sandybridge variable on my setup.
2. The same gaussian executable is exported from the headnode to several machines and only this one gives me issues. They started recently.
3. I have stress-tested the CPU with mprime (>24h), and did long runs with cp2k. No errors were detected. Both use avx2 routines.
4. I have tested memory with memtest, and physically removed half of the sticks to see whether that alleviates the issues. Still getting the errors.
5. I have reinstalled the OS.
Now, I am extremely confused because of conflicting results. On my hand, same executable gives me error only on one out of few machines, which are copies of each other. On the other hand, every other task does not give any problem. I know I am reaching bit outside of your expertise, but maybe you will help me to diagnose the problem (at least software vs hardware level). There is little on the internet beyond PGI_FASTMATH_CPU.
Best,
~M.