2 Replies Latest reply on May 21, 2012 10:52 AM by dhughes

    GPU hanging: AMD APP SDK 2.6 / fglrx 11.12

    d.a.a.

      Hi,

      From the trace dump below is it possible to have a clue what's going on here? The GPU is a Radeon HD6970 and I'm using Debian GNU/Linux with AMD APP SDK 2.6 and fglrx 11.12 drivers (it also hangs with fglrx 11.11/SDK 2.5). The machine is only usable after a reboot: I cannot kill the OpenCL application (gpocl) nor the X server. The actual OpenCL kernel is a bit complicated, but it had worked before (at least with fglrx 10.12/AMD SDK 2.3) and works smoothly on Nvidia GPUs.

      [ 495.045428] [fglrx] ASIC hang happened [ 495.045431] Pid: 2684, comm: gpocl Tainted: P O 3.1.0-1-amd64 #1 [ 495.045432] Call Trace: [ 495.045458] [<ffffffffa02e4e4c>] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx] [ 495.045493] [<ffffffffa0371084>] ? _ZN18mmEnginesContainer9timestampEP26_QS_MM_TIMESTAMP_PACKET_INP27_QS_MM_TIMESTAMP_PACKET_OUT+0x184/0x1c0 [fglrx] [ 495.045513] [<ffffffffa0301a62>] ? firegl_trace+0x72/0x1e0 [fglrx] [ 495.045528] [<ffffffffa02d8d13>] ? drm_alloc+0xc3/0x1a0 [fglrx] [ 495.045562] [<ffffffffa036e883>] ? _ZN15QS_PRIVATE_CORE25escapeMultiMediaInterfaceEP21_QS_QUERY_API_CALL_INPvjS2_j+0xd3/0xe0 [fglrx] [ 495.045595] [<ffffffffa03633fc>] ? _Z8uCWDDEQCmjjPvjS_+0xe7c/0x10c0 [fglrx] [ 495.045598] [<ffffffff8103538b>] ? should_resched+0x5/0x23 [ 495.045602] [<ffffffff81070185>] ? arch_local_irq_save+0x11/0x17 [ 495.045622] [<ffffffffa0304032>] ? firegl_cmmqs_CWDDE_32+0x332/0x440 [fglrx] [ 495.045642] [<ffffffffa0302960>] ? firegl_cmmqs_CWDDE32+0x70/0x100 [fglrx] [ 495.045661] [<ffffffffa03028f0>] ? firegl_cmmqs_createdriver+0x170/0x170 [fglrx] [ 495.045677] [<ffffffffa02e09ed>] ? firegl_ioctl+0x1ed/0x250 [fglrx] [ 495.045680] [<ffffffff8133076a>] ? do_page_fault+0x2fc/0x337 [ 495.045691] [<ffffffffa02d3b28>] ? ip_firegl_unlocked_ioctl+0x6/0xa [fglrx] [ 495.045693] [<ffffffff81100f94>] ? do_vfs_ioctl+0x452/0x493 [ 495.045696] [<ffffffff8106ed18>] ? sys_futex+0x138/0x148 [ 495.045697] [<ffffffff81101020>] ? sys_ioctl+0x4b/0x6f [ 495.045700] [<ffffffff81332792>] ? system_call_fastpath+0x16/0x1b [ 495.045702] pubdev:0xffffffffa055e6a0, num of device:1 , name:fglrx, major 8, minor 92. [ 495.045704] device 0 : 0xffff8806251ac000 . [ 495.045706] Asic ID:0x6718, revision:0x1, MMIOReg:0xffffc90013f00000. [ 495.045708] FB phys addr: 0xd0000000, MC :0xf800000000, Total FB size :0x80000000. [ 495.045709] gart table MC:0xf80f8ff000, Physical:0xdf8ff000, size:0x400000. [ 495.045711] mc_node :FB, total 1 zones [ 495.045713] MC start:0xf800000000, Physical:0xd0000000, size:0xfd00000. [ 495.045715] Mapped heap -- Offset:0x0, size:0xf8ff000, reference count:18, mapping count:0, [ 495.045717] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0, [ 495.045719] Mapped heap -- Offset:0xf8ff000, size:0x401000, reference count:1, mapping count:0, [ 495.045721] mc_node :INV_FB, total 1 zones [ 495.045722] MC start:0xf80fd00000, Physical:0xdfd00000, size:0x70300000. [ 495.045724] Mapped heap -- Offset:0x702f4000, size:0xc000, reference count:1, mapping count:0, [ 495.045726] mc_node :GART_USWC, total 2 zones [ 495.045727] MC start:0xffb0100000, Physical:0x0, size:0x4ff00000. [ 495.045729] Mapped heap -- Offset:0x0, size:0x2000000, reference count:17, mapping count:0, [ 495.045731] mc_node :GART_CACHEABLE, total 3 zones [ 495.045732] MC start:0xff80400000, Physical:0x0, size:0x2fd00000. [ 495.045734] Mapped heap -- Offset:0x400000, size:0x100000, reference count:2, mapping count:0, [ 495.045736] Mapped heap -- Offset:0x300000, size:0x100000, reference count:1, mapping count:0, [ 495.045738] Mapped heap -- Offset:0x200000, size:0x100000, reference count:2, mapping count:0, [ 495.045740] Mapped heap -- Offset:0x0, size:0x200000, reference count:4, mapping count:0, [ 495.045742] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0, [ 495.045744] Mapped heap -- Offset:0x110000, size:0x33000, reference count:1, mapping count:0, [ 495.045746] Mapped heap -- Offset:0xf6000, size:0x19000, reference count:1, mapping count:0, [ 495.045748] Mapped heap -- Offset:0x0, size:0xf5000, reference count:1, mapping count:0, [ 495.045751] GRBM : 0x3828, SRBM : 0x200000c0 . [ 495.045754] CP_RB_BASE : 0xffb01000, CP_RB_RPTR : 0x3310 , CP_RB_WPTR :0x3310. [ 495.045758] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0xff, CP_IB1_BASE_LO:0xb0526000. [ 495.045760] last submit IB buffer -- MC :0xffb0526000,phys:0x625686000. [ 495.045761] Dump the trace queue. [ 495.045762] End of dump

        • Re: GPU hanging: AMD APP SDK 2.6 / fglrx 11.12
          yurtesen

          Could you solve this issue? I am having a similar problem I think...

          • Re: GPU hanging: AMD APP SDK 2.6 / fglrx 11.12
            dhughes

            I am have a similar problem on three systems with Radeon HD 5450 Catalyst drivers and SuSE 11.4

             

            May 21 07:49:55 ???? kernel: [46085.928017] Pid: 6996, comm: X Tainted: P       O 3.3.6-24-default #1

            May 21 07:49:55 ???? kernel: [46085.928020] Call Trace:

            May 21 07:49:55 ???? kernel: [46085.928045]  [<ffffffff8100445a>] dump_trace+0x9a/0x260

            May 21 07:49:55 ???? kernel: [46085.928054]  [<ffffffff815587a0>] dump_stack+0x69/0x6f

            May 21 07:49:55 ???? kernel: [46085.928157]  [<ffffffffa02650cc>] firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.928270]  [<ffffffffa0300cd9>] _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.928555]  [<ffffffffa0300c7c>] _ZN4Asic9WaitUntil15WaitForCompleteEv+0x9c/0xf0 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.928821]  [<ffffffffa02fb77e>] _ZN15ExecutableUnits10CPRingIdleE15idle_WaitMethod12_QS_CP_RING_+0x11e/0x1e0 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.929075]  [<ffffffffa02fb60c>] _ZN15ExecutableUnits7PM4idleE15idle_WaitMethod+0x4c/0x90 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.929329]  [<ffffffffa02fb13e>] _ZN15ExecutableUnits9assertPM4Eb+0x1e/0x70 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.929583]  [<ffffffffa0305519>] _ZN8AsicR6009assertPM4Eb+0x39/0x80 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.929851]  [<ffffffffa02d3b24>] CMMQS_DisableQS+0x24/0x30 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.930012]  [<ffffffffa02852a8>] firegl_cmmqs_Disable_QS+0x58/0xf0 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.930109]  [<ffffffffa0284112>] firegl_cmmqs_disableqs+0x12/0x70 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.930201]  [<ffffffffa0260ded>] firegl_ioctl+0x1ed/0x250 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.930269]  [<ffffffffa0251e89>] ip_firegl_unlocked_ioctl+0x9/0x10 [fglrx]

            May 21 07:49:55 ???? kernel: [46085.930279]  [<ffffffff81168a15>] do_vfs_ioctl+0x75/0x2d0

            May 21 07:49:55 ???? kernel: [46085.930285]  [<ffffffff81168d08>] sys_ioctl+0x98/0xa0

            May 21 07:49:55 ???? kernel: [46085.930292]  [<ffffffff81578639>] system_call_fastpath+0x16/0x1b

            May 21 07:49:55 ???? kernel: [46085.930306]  [<00007f1ff99ad837>] 0x7f1ff99ad836

            May 21 07:49:55 ???? kernel: [46085.930311] pubdev:0xffffffffa0501320, num of device:1 , name:fglrx, major 8, minor 96.

            May 21 07:49:55 ???? kernel: [46085.930314] device 0 : 0xffff8806252d4000 .

            May 21 07:49:55 ???? kernel: [46085.930317] Asic ID:0x68f9, revision:0x3c, MMIOReg:0xffffc90014ac0000.

            May 21 07:49:55 ???? kernel: [46085.930320] FB phys addr: 0xd0000000, MC :0xf00000000, Total FB size :0x20000000.

            May 21 07:49:55 ???? kernel: [46085.930323] gart table MC:0xf0f8fd000, Physical:0xdf8fd000, size:0x402000.

            May 21 07:49:55 ???? kernel: [46085.930326] mc_node :FB, total 1 zones

            May 21 07:49:55 ???? kernel: [46085.930328] MC start:0xf00000000, Physical:0xd0000000, size:0xfd00000.
            May 21 07:49:55 ???? kernel: [46085.930332] Mapped heap -- Offset:0x0, size:0xf8fd000, reference count:28, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930335] Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930338] Mapped heap -- Offset:0xf8fd000, size:0x403000, reference count:1, mapping count:0,

            May 21 07:49:55 ???? kernel: [46085.930340] mc_node :INV_FB, total 1 zones

            May 21 07:49:55 ???? kernel: [46085.930342] MC start:0xf0fd00000, Physical:0xdfd00000, size:0x10300000.
            May 21 07:49:55 ???? kernel: [46085.930345] Mapped heap -- Offset:0x102f4000, size:0xc000, reference count:1, mapping count:0,

            May 21 07:49:55 ???? kernel: [46085.930348] mc_node :GART_USWC, total 3 zones

            May 21 07:49:55 ???? kernel: [46085.930350] MC start:0x40100000, Physical:0x0, size:0x50000000.
            May 21 07:49:55 ???? kernel: [46085.930353] Mapped heap -- Offset:0x0, size:0x2000000, reference count:9, mapping count:0,

            May 21 07:49:55 ???? kernel: [46085.930355] mc_node :GART_CACHEABLE, total 3 zones

            May 21 07:49:55 ???? kernel: [46085.930357] MC start:0x10400000, Physical:0x0, size:0x2fd00000.
            May 21 07:49:55 ???? kernel: [46085.930360] Mapped heap -- Offset:0x1300000, size:0x300000, reference count:2, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930363] Mapped heap -- Offset:0x1600000, size:0x800000, reference count:2, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930366] Mapped heap -- Offset:0x1000000, size:0x300000, reference count:1, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930369] Mapped heap -- Offset:0xc00000, size:0x400000, reference count:2, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930372] Mapped heap -- Offset:0x900000, size:0x300000, reference count:7, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930375] Mapped heap -- Offset:0x600000, size:0x300000, reference count:2, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930378] Mapped heap -- Offset:0x200000, size:0x400000, reference count:5, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930381] Mapped heap -- Offset:0x0, size:0x200000, reference count:8, mapping count:0,
            May 21 07:49:55 ???? kernel: [46085.930384] Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,

            May 21 07:49:55 ???? kernel: [46085.930388] GRBM : 0xa0003828, SRBM : 0x200000c0 .

            May 21 07:49:55 ???? kernel: [46085.930392] CP_RB_BASE : 0x401000, CP_RB_RPTR : 0x1f500 , CP_RB_WPTR :0x1f500.

            May 21 07:49:55 ???? kernel: [46085.930396] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x4027c000.

            May 21 07:49:55 ???? kernel: [46085.930399] last submit IB buffer -- MC :0x4027c000,phys:0x224cfa000.

            May 21 07:49:55 ???? kernel: [46085.930403] Dump the trace queue.

            May 21 07:49:55 ???? kernel: [46085.930404] End of dump