AnsweredAssumed Answered

GPU fault detected: 147 0x02007702

Question asked by ekran on May 13, 2017
Latest reply on May 17, 2017 by ekran

OS: CentOS 7.3 with properly installed AMD-APP-SDK-v3.0.130.136-GA-linux64.sh and amdgpu-pro-17.10-410326.tar.xz.

 

I am running a cryptocoin mining rig, with 5xAMD GPUs (2xRX580 and 3XRX480). The mining starts perfectly and runs smoothly for a while (anything from hours to a few days) then boom! This shows up in the system logfile:

 

May 13 13:13:06 agamemnon kernel: amdgpu 0000:04:00.0: GPU fault detected: 147 0x02007702

May 13 13:13:06 agamemnon kernel: amdgpu 0000:04:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000C40

May 13 13:13:06 agamemnon kernel: amdgpu 0000:04:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E077002

May 13 13:13:06 agamemnon kernel: amdgpu 0000:04:00.0: VM fault (0x02, vmid 7) at page 3136, read from 'SDM0'

 

The mining stops and I have to reboot the rig in order to make things run again. Question is, which GPU is this? Is this a driver issue or could it be something wrong with my hardware?

 

I have seen people getting the same error elsewhere (ofcourse I googled it first) but I don't see anyone having a good solution to this.

 

Any help or tips would be appreciated.

 

Rig info:

CPU: MD Athlon II X4 860K Black Prosessor - 3.7 GHz

RAM: Corsair 4GB DDR3 1600MHz Vengeance
MB: ASRock FM2A88X+ BTC Hovedkort - AMD A88X
Disk: 1TB SATA Seagate
GFX: Sapphire Radeon RX 580 8GB Pulse
GFX: Sapphire Radeon RX 580 8GB Pulse
GFX: Sapphire Radeon RX 480 4GB NITRO+
GFX: Sapphire Radeon RX 480 4GB NITRO+
GFX: Sapphire Radeon RX 480 4GB NITRO+
PSU: XFX ProSeries XXX Edition 850W Bronze
PSU: Corsair VS650, 650W PSU

OS: CentOS 7.3
Case: Custom + 5 USB/PCI-e Risers

Outcomes