d.a.a.

GPU hanging: AMD APP SDK 2.6 / fglrx 11.12

Discussion created by d.a.a. on Dec 17, 2011
Latest reply on May 21, 2012 by dhughes

Hi,

From the trace dump below is it possible to have a clue what's going on here? The GPU is a Radeon HD6970 and I'm using Debian GNU/Linux with AMD APP SDK 2.6 and fglrx 11.12 drivers (it also hangs with fglrx 11.11/SDK 2.5). The machine is only usable after a reboot: I cannot kill the OpenCL application (gpocl) nor the X server. The actual OpenCL kernel is a bit complicated, but it had worked before (at least with fglrx 10.12/AMD SDK 2.3) and works smoothly on Nvidia GPUs.

[ 495.045428] [fglrx] ASIC hang happened [ 495.045431] Pid: 2684, comm: gpocl Tainted: P O 3.1.0-1-amd64 #1 [ 495.045432] Call Trace: [ 495.045458] [<ffffffffa02e4e4c>] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx] [ 495.045493] [<ffffffffa0371084>] ? _ZN18mmEnginesContainer9timestampEP26_QS_MM_TIMESTAMP_PACKET_INP27_QS_MM_TIMESTAMP_PACKET_OUT+0x184/0x1c0 [fglrx] [ 495.045513] [<ffffffffa0301a62>] ? firegl_trace+0x72/0x1e0 [fglrx] [ 495.045528] [<ffffffffa02d8d13>] ? drm_alloc+0xc3/0x1a0 [fglrx] [ 495.045562] [<ffffffffa036e883>] ? _ZN15QS_PRIVATE_CORE25escapeMultiMediaInterfaceEP21_QS_QUERY_API_CALL_INPvjS2_j+0xd3/0xe0 [fglrx] [ 495.045595] [<ffffffffa03633fc>] ? _Z8uCWDDEQCmjjPvjS_+0xe7c/0x10c0 [fglrx] [ 495.045598] [<ffffffff8103538b>] ? should_resched+0x5/0x23 [ 495.045602] [<ffffffff81070185>] ? arch_local_irq_save+0x11/0x17 [ 495.045622] [<ffffffffa0304032>] ? firegl_cmmqs_CWDDE_32+0x332/0x440 [fglrx] [ 495.045642] [<ffffffffa0302960>] ? firegl_cmmqs_CWDDE32+0x70/0x100 [fglrx] [ 495.045661] [<ffffffffa03028f0>] ? firegl_cmmqs_createdriver+0x170/0x170 [fglrx] [ 495.045677] [<ffffffffa02e09ed>] ? firegl_ioctl+0x1ed/0x250 [fglrx] [ 495.045680] [<ffffffff8133076a>] ? do_page_fault+0x2fc/0x337 [ 495.045691] [<ffffffffa02d3b28>] ? ip_firegl_unlocked_ioctl+0x6/0xa [fglrx] [ 495.045693] [<ffffffff81100f94>] ? do_vfs_ioctl+0x452/0x493 [ 495.045696] [<ffffffff8106ed18>] ? sys_futex+0x138/0x148 [ 495.045697] [<ffffffff81101020>] ? sys_ioctl+0x4b/0x6f [ 495.045700] [<ffffffff81332792>] ? system_call_fastpath+0x16/0x1b [ 495.045702] pubdev:0xffffffffa055e6a0, num of device:1 , name:fglrx, major 8, minor 92. [ 495.045704] device 0 : 0xffff8806251ac000 . [ 495.045706] Asic ID:0x6718, revision:0x1, MMIOReg:0xffffc90013f00000. [ 495.045708] FB phys addr: 0xd0000000, MC :0xf800000000, Total FB size :0x80000000. [ 495.045709] gart table MC:0xf80f8ff000, Physical:0xdf8ff000, size:0x400000. [ 495.045711] mc_node :FB, total 1 zones [ 495.045713] MC start:0xf800000000, Physical:0xd0000000, size:0xfd00000. [ 495.045715] Mapped heap -- Offset:0x0, size:0xf8ff000, reference count:18, mapping count:0, [ 495.045717] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0, [ 495.045719] Mapped heap -- Offset:0xf8ff000, size:0x401000, reference count:1, mapping count:0, [ 495.045721] mc_node :INV_FB, total 1 zones [ 495.045722] MC start:0xf80fd00000, Physical:0xdfd00000, size:0x70300000. [ 495.045724] Mapped heap -- Offset:0x702f4000, size:0xc000, reference count:1, mapping count:0, [ 495.045726] mc_node :GART_USWC, total 2 zones [ 495.045727] MC start:0xffb0100000, Physical:0x0, size:0x4ff00000. [ 495.045729] Mapped heap -- Offset:0x0, size:0x2000000, reference count:17, mapping count:0, [ 495.045731] mc_node :GART_CACHEABLE, total 3 zones [ 495.045732] MC start:0xff80400000, Physical:0x0, size:0x2fd00000. [ 495.045734] Mapped heap -- Offset:0x400000, size:0x100000, reference count:2, mapping count:0, [ 495.045736] Mapped heap -- Offset:0x300000, size:0x100000, reference count:1, mapping count:0, [ 495.045738] Mapped heap -- Offset:0x200000, size:0x100000, reference count:2, mapping count:0, [ 495.045740] Mapped heap -- Offset:0x0, size:0x200000, reference count:4, mapping count:0, [ 495.045742] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0, [ 495.045744] Mapped heap -- Offset:0x110000, size:0x33000, reference count:1, mapping count:0, [ 495.045746] Mapped heap -- Offset:0xf6000, size:0x19000, reference count:1, mapping count:0, [ 495.045748] Mapped heap -- Offset:0x0, size:0xf5000, reference count:1, mapping count:0, [ 495.045751] GRBM : 0x3828, SRBM : 0x200000c0 . [ 495.045754] CP_RB_BASE : 0xffb01000, CP_RB_RPTR : 0x3310 , CP_RB_WPTR :0x3310. [ 495.045758] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0xff, CP_IB1_BASE_LO:0xb0526000. [ 495.045760] last submit IB buffer -- MC :0xffb0526000,phys:0x625686000. [ 495.045761] Dump the trace queue. [ 495.045762] End of dump

Outcomes