cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

thesmileman
Journeyman III

Why does the driver lock up computer rather than crash?

I find myself occasionally finding kernels will lock up the entire machine. This is normal with other vendors but they will eventually crash the display adapter after a certain amount of time. It was my understanding that others had the inverse of this problem where long calculations would time out. Is it possible that a fix for their problem is causing this?

0 Likes
3 Replies
MicahVillmow
Staff
Staff

Re: Why does the driver lock up computer rather than crash?

The difference between reseting the card, crashing the driver versus hanging the display is one of a infinite loop/long running program versus a live-lock/dead-lock on the card. Because a GPU is not a pre-emptible, if your program causes the card to lock up, then there is no way to reset it. While your display is no longer available, you can still ssh into the machine.

thesmileman
Journeyman III

Re: Why does the driver lock up computer rather than crash?

Fair enough for the display card but this happens even if code is not being run on the main display adapter. For example if I have two AMD cards one driving the display and one just doing compute.

0 Likes
yurtesen
Miniboss

Re: Why does the driver lock up computer rather than crash?

Programs timing out is not a problem at all (it has very flexible workarounds). But when the device crashes, you have to reboot the machine (I often ssh into the box and reboot it through that...it takes few minutes to reboot due to stuck process).

I guess this is why the AMD products are not used often for actual gpgpu computing... Because rebooting a cluster would be an undesirable feature.

I vote for timeout over crash Hopefully AMD would fix this issue before they loose GPGPU computing to computers...

0 Likes