AnsweredAssumed Answered

How to detect crashed GPUs and reset them

Question asked by jungle on Aug 1, 2019

I am looking for a solution for a datacenter with AMD Gpus that I can use to detect if cards are frozen/ crashed and be able to reset them.

Sometimes I think the card is still functioning but doesn't output a display and sometimes it is fully dead so I would like to be able to detect either.

 

Something similar to Nvidias DCGM ?

 

Or is there a good way to write this manually, for Linux and Windows.

Outcomes