Hi All,
Recently, we try to use RX 550 card in our risc-v cpu mainboard, it was reported that " *ERROR* ring gfx test failed (-110)".
Our development environment is that ubuntu version is 22.04, and linux kernel version is 5.19.17, the error log as following,
what is its reason? and how to resolve this issue!
[ 21.553620] [drm] amdgpu kernel modesetting enabled.
[ 21.559356] amdgpu 0000:01:00.0: vgaarb: deactivate vga console
[ 21.565663] [drm] initializing kernel modesetting (POLARIS12 0x1002:0x699F 0x1002:0x0B04 0xC7).
[ 21.574411] [drm] register mmio base: 0x70000000
[ 21.579036] [drm] register mmio size: 262144
[ 21.583592] [drm] original adev->gfx_timeout=2500
[ 21.583601] [drm] index=0
[ 21.588326] [drm] adev->gfx_timeout=2500
[ 21.590951] [drm] adev->compute_timeout=15000
[ 21.594876] [drm] adev->sdma_timeout=2500
[ 21.599234] [drm] adev->video_timeout=2500
[ 21.603248] [drm] add ip block number 0 <vi_common>
[ 21.612222] [drm] add ip block number 1 <gmc_v8_0>
[ 21.617017] [drm] add ip block number 2 <tonga_ih>
[ 21.621810] [drm] add ip block number 3 <gfx_v8_0>
[ 21.626602] [drm] add ip block number 4 <sdma_v3_0>
[ 21.631480] [drm] add ip block number 5 <powerplay>
[ 21.636359] [drm] add ip block number 6 <dm>
[ 21.640630] [drm] add ip block number 7 <uvd_v6_0>
[ 21.645423] [drm] add ip block number 8 <vce_v3_0>
[ 21.979827] amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 21.986136] amdgpu: ATOM BIOS: 113-550-4GY
[ 21.990332] [drm] UVD is enabled in VM mode
[ 21.994525] [drm] UVD ENC is enabled in VM mode
[ 21.999059] [drm] VCE enabled in VM mode
[ 22.002988] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 22.011084] amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[ 22.018023] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 22.026568] amdgpu 0000:01:00.0: BAR 2: releasing [mem 0x4110000000-0x41101fffff 64bit pref]
[ 22.035058] amdgpu 0000:01:00.0: BAR 0: releasing [mem 0x4100000000-0x410fffffff 64bit pref]
[ 22.043562] pcieport 0000:00:00.0: BAR 15: releasing [mem 0x4100000000-0x4117ffffff 64bit pref]
[ 22.052295] pcieport 0000:00:00.0: BAR 15: assigned [mem 0x4100000000-0x427fffffff 64bit pref]
[ 22.060921] amdgpu 0000:01:00.0: BAR 0: assigned [mem 0x4100000000-0x41ffffffff 64bit pref]
[ 22.069293] amdgpu 0000:01:00.0: BAR 2: assigned [mem 0x4200000000-0x42001fffff 64bit pref]
[ 22.077666] pcieport 0000:00:00.0: PCI bridge to [bus 01]
[ 22.083070] pcieport 0000:00:00.0: bridge window [io 0x1000-0x1fff]
[ 22.089604] pcieport 0000:00:00.0: bridge window [mem 0x4070000000-0x40700fffff]
[ 22.097175] pcieport 0000:00:00.0: bridge window [mem 0x4100000000-0x427fffffff 64bit pref]
[ 22.105718] amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 22.115290] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 22.123645] [drm] Detected VRAM RAM=4096M, BAR=4096M
[ 22.128611] [drm] RAM width 128bits GDDR5
[ 22.200436] [drm] amdgpu: 4096M of VRAM memory ready
[ 22.205475] [drm] amdgpu: 3973M of GTT memory ready.
[ 22.210669] [drm] GART: num cpu pages 65536, num gpu pages 65536
[ 22.217926] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 22.226776] cdns_pcie_irq_domain_alloc[569]virq=15, nr_irqs=1, bit=2
[ 22.233223] cdns-pcie-host 7060000000.pcie: msi#2 address_hi 0x0 address_lo 0xfe200008
[ 22.241714] [drm] Chained IB support enabled!
[ 22.247082] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.252728] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.258273] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.263806] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.269333] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.274856] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.280377] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.285898] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.291416] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.296966] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.304091] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.309637] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.315194] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[ 22.321385] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[ 22.327668] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.348471] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.354039] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.359947] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[ 22.366919] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.372457] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.377976] amdgpu_ring_init ENTER amdgpu_bo_create_kernel
[ 22.625604] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[ 22.652190] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110
[ 22.677694] amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
[ 22.684143] amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
[ 22.690512] amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
[ 22.699034] amdgpu: probe of 0000:01:00.0 failed with error -110
[ 22.705110] Unable to handle kernel access to user memory without uaccess routines at virtual address 0000000000000090
[ 22.715821] Oops [#1]
[ 22.718095] Modules linked in: amdgpu(+) gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm backlight cec
[ 22.734211] CPU: 62 PID: 490 Comm: systemd-udevd Not tainted 5.19.17+ #1
[ 22.740910] Hardware name: sophgo mango (DT)
[ 22.745176] epc : drm_sched_fini+0xae/0xf4 [gpu_sched]
[ 22.750447] ra : drm_sched_fini+0xa8/0xf4 [gpu_sched]
[ 22.755672] epc : ffffffff02087572 ra : ffffffff0208756c sp : ffffffd9118d3880
[ 22.762888] gp : ffffffff81a364e8 tp : ffffffd9118b5400 t0 : 0000000000000002
[ 22.770102] t1 : 0000000000000000 t2 : 0000000000000040 s0 : ffffffd9118d38d0
[ 22.777316] s1 : ffffffd918f496f8 a0 : ffffffd918f496e8 a1 : ffffffd9112f4fa8
[ 22.784529] a2 : 0000000000000001 a3 : ffffffd918f40014 a4 : 0000000000000001
[ 22.791742] a5 : 0000000000000000 a6 : 00110ffff0000201 a7 : 00000000ff000000
[ 22.798955] s2 : 0000000000000001 s3 : ffffffd918f49650 s4 : ffffffff81abb910
[ 22.806168] s5 : ffffffd918f496e8 s6 : 0000000000000010 s7 : ffffffd918f49658
[ 22.813381] s8 : ffffffff80e6f548 s9 : ffffffff02c46740 s10: ffffffff81a363f4
[ 22.820594] s11: ffffffff8142fd70 t3 : 0000000000000002 t4 : 0000000000000402
[ 22.827806] t5 : ffffffd8ffd1ba58 t6 : ffffffd900c6c902
[ 22.833111] status: 0000000200000120 badaddr: 0000000000000090 cause: 000000000000000f
[ 22.841022] [<ffffffff02873878>] amdgpu_fence_driver_sw_fini+0xe4/0xe6 [amdgpu]
[ 22.865019] [<ffffffff0286254e>] amdgpu_device_fini_sw+0x3c/0x31c [amdgpu]
[ 22.888609] [<ffffffff0286809e>] amdgpu_driver_release_kms+0x26/0x38 [amdgpu]
[ 22.912456] [<ffffffff01c1634c>] devm_drm_dev_init_release+0x50/0x7c [drm]
[ 22.921235] [<ffffffff80664338>] devm_action_release+0x1e/0x26
[ 22.927078] [<ffffffff806647e8>] release_nodes+0x52/0xa6
[ 22.932388] [<ffffffff806659be>] devres_release_all+0x8c/0xbc
[ 22.938132] [<ffffffff8065fda4>] really_probe+0x104/0x364
[ 22.943528] [<ffffffff806600c6>] __driver_probe_device+0xc2/0x110
[ 22.949616] [<ffffffff80660154>] driver_probe_device+0x40/0xcc
[ 22.955444] [<ffffffff80660914>] __driver_attach+0x9a/0x1d8
[ 22.961013] [<ffffffff8065d8cc>] bus_for_each_dev+0x62/0xa0
[ 22.966582] [<ffffffff8065f5b4>] driver_attach+0x2e/0x36
[ 22.971891] [<ffffffff8065f03c>] bus_add_driver+0x140/0x202
[ 22.977459] [<ffffffff8066131e>] driver_register+0x64/0x104
[ 22.983027] [<ffffffff8058499c>] __pci_register_driver+0x54/0x5c
[ 22.989037] [<ffffffff04db2080>] amdgpu_init+0x80/0x1000 [amdgpu]
[ 23.011812] [<ffffffff8000279c>] do_one_initcall+0x44/0x1ac
[ 23.017387] [<ffffffff8009fa0a>] do_init_module+0x54/0x1fe
[ 23.022875] [<ffffffff800a180e>] load_module+0x1ac6/0x1df8
[ 23.028357] [<ffffffff800a1d6e>] __do_sys_finit_module+0x98/0xf4
[ 23.034358] [<ffffffff800a1dee>] sys_finit_module+0x24/0x2c
[ 23.039925] [<ffffffff80003a20>] ret_from_syscall+0x0/0x2
Hi. I get *ERROR* ring gfx test failed (-110) with RX570.
[16103.542546] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1787:0x2379 0xEF).
[16103.542591] [drm] register mmio base: 0xC1C00000
[16103.542594] [drm] register mmio size: 262144
[16103.542694] [drm] add ip block number 0 <vi_common>
[16103.542701] [drm] add ip block number 1 <gmc_v8_0>
[16103.542704] [drm] add ip block number 2 <tonga_ih>
[16103.542706] [drm] add ip block number 3 <gfx_v8_0>
[16103.542708] [drm] add ip block number 4 <sdma_v3_0>
[16103.542711] [drm] add ip block number 5 <powerplay>
[16103.542713] [drm] add ip block number 6 <dm>
[16103.542716] [drm] add ip block number 7 <uvd_v6_0>
[16103.542718] [drm] add ip block number 8 <vce_v3_0>
[16103.945199] amdgpu 0000:0a:00.0: amdgpu: Fetched VBIOS from ROM BAR
[16103.945208] amdgpu: ATOM BIOS: 113-D0000301_100
[16103.945240] [drm] UVD is enabled in VM mode
[16103.945243] [drm] UVD ENC is enabled in VM mode
[16103.945248] [drm] VCE enabled in VM mode
[16103.945252] amdgpu 0000:0a:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[16103.945259] amdgpu 0000:0a:00.0: amdgpu: PCIE atomic ops is not supported
[16103.945299] amdgpu 0000:0a:00.0: amdgpu: PCI CONFIG reset
[16103.945430] [drm] GPU posting now...
[16104.068719] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[16104.068785] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_mc.bin
[16104.068803] amdgpu 0000:0a:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[16104.068806] amdgpu 0000:0a:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[16104.068826] [drm] Detected VRAM RAM=8192M, BAR=256M
[16104.068827] [drm] RAM width 256bits GDDR5
[16104.068846] [drm] amdgpu: 8192M of VRAM memory ready
[16104.068848] [drm] amdgpu: 3918M of GTT memory ready.
[16104.068863] [drm] GART: num cpu pages 65536, num gpu pages 65536
[16104.070195] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[16104.070322] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_pfp_2.bin
[16104.070340] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_me_2.bin
[16104.070355] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_ce_2.bin
[16104.070357] [drm] Chained IB support enabled!
[16104.070375] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_rlc.bin
[16104.070473] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_mec_2.bin
[16104.070563] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_mec2_2.bin
[16104.071369] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_sdma.bin
[16104.071394] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_sdma1.bin
[16104.071424] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[16104.071557] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_uvd.bin
[16104.071560] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[16104.086295] amdgpu 0000:0a:00.0: firmware: direct-loading firmware amdgpu/polaris10_vce.bin
[16104.086302] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[16104.489807] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[16104.490038] [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110
[16104.490306] amdgpu 0000:0a:00.0: amdgpu: amdgpu_device_ip_init failed
[16104.490309] amdgpu 0000:0a:00.0: amdgpu: Fatal error during GPU init
[16104.490348] amdgpu 0000:0a:00.0: amdgpu: amdgpu: finishing device.
[16104.491660] amdgpu: probe of 0000:0a:00.0 failed with error -110
[16104.913293] [drm] amdgpu: ttm finalized
I used amdgpu driver for ubuntu 22.04. Kernel 6.1.0.
In my case the error was due to the double cable connection between thunderbolt 2 on the computer and thunderbolt 3 on the GPU card. Rescanning the device bus fixed it.