I'm running ubuntu 18.04 with latest drivers 19.10 and kernel 4.15.0-48-generic.
I'm experiencing hard reboot while heavy Load on gpu. There is absolutely no System log since it look limes NMI.
Reboot happends with a gpu Power draw of 80w or 200w either (with Radeon VII).
This only happends under linux, I've tried to reproduce under windows but this never happends.
I have also tried 18.50 drivers with exactly same results.
I've also tried kernel 4.18, and 5.1 same behavior.
Before Radeon VII I was using 2x Rx580 running a 100w of constant power draw(each) never experience reboot like this before switching to new Radeon VII.
Thanks in advance
Finaly I got same behavior on windows running openCL stress test. Same while playing The Division 2. Seems to be the GPU the faulty.
I've tried on differents computers, GPUs seems to be faulty.
I've bought 4 radeon VII, One never works at all (computer won't post), 2 were defectives. It looks like there is QC at all.
700€ per card, 3 weren't working correclty.. I'm a bit disappointed.
I will send them back, again, and hope to have new one which are working this time (I hope..)
Thanks to all for helping
Can you compare voltage in wattman between Linux and Windows? Perhaps there is a difference in defaults. The only time I have experienced an reboot with the VII like you mention is when I went a little aggressive with the undervolting at 990mv.
I'm running at stock clock, 1009mv @ 1801Mhz.
I have more informations
First it's direclty depending of opencl version pulled by amdgpu-pro driver.
- 18.50-725072 luxmark(ball 50830 pts) and superposition (1080p medium 9486 pts) can run correclty and complete, reboot still observed while long opencl compute
- 18.50-756341 luxmark(ball 51304 pts) and superposition (1080p medium 9487 pts) can run correclty and complete, reboot still observed while long opencl compute
- 19.10-785425 luxmark or superposition, computer reboot few seconds after starting benchmark.