cancel
Showing results for 
Search instead for 
Did you mean: 

Graphics Cards

eazy-f
Journeyman III

Fan speed and driver restarts on Linux with 7900 XTX

Usually under the load but sometimes without it during regular browsing the AMDGPU driver restarts and restarts X11 alongside. The restart can be witnessed in dmesg output. The issue is easily reproducible with any significant load, say glmark2 with terrain benchmark or heavy OpenCL usage for some time. I can observe that junction temperature rises up to 80-85 C with fans spinning 1000-1200 RPM but no faster than that. Fans speed seem locked and within minutes temperatures can become 90 C and driver eventually restarts.

There is no way to control fan speed curve in Linux with these GPUs at the moment so the issue is quite frustrating and looks like a simple driver bug which cannot prevent overheating.

OS: Ubuntu 20.04

Driver version: 22.40.3

update with dmesg output:

[ 7667.626261] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox pid 20241 thread firefox:cs0 pid 20374
[ 7668.640040] amdgpu 0000:03:00.0: amdgpu: IP block:gfx_v11_0 is hung!
[ 7668.640079] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 7668.640741] gmc_v11_0_process_interrupt: 87 callbacks suppressed
[ 7668.640743] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:174 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 7668.640745] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 7668.640747] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B5D
[ 7668.640748] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 7668.640749] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1
[ 7668.640750] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x6
[ 7668.640751] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5
[ 7668.640751] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 7668.640752] amdgpu 0000:03:00.0: amdgpu: RW: 0x1
[ 7668.640757] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:174 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 7668.640758] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 7668.640759] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7668.640760] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 7668.640761] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 7668.640762] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7668.640763] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 7668.640764] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7668.640765] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[ 7668.640770] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:174 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 7668.640771] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 7668.640772] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7668.640773] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 7668.640774] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 7668.640774] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7668.640775] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 7668.640776] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7668.640777] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[ 7668.640782] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:174 vmid:0 pasid:0, for process pid 0 thread pid 0)
[ 7668.640783] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
[ 7668.640784] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7668.640785] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CB/DB (0x0)
[ 7668.640786] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
[ 7668.640787] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
[ 7668.640788] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x0
[ 7668.640788] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 7668.640789] amdgpu 0000:03:00.0: amdgpu: RW: 0x0
[ 7668.912203] Failed to wait all pipes clean
[ 7668.912205] amdgpu 0000:03:00.0: amdgpu: soft reset failed, will fallback to full reset!

 

 

0 Likes
0 Replies