1 Reply Latest reply on Mar 11, 2017 2:22 AM by cedarlug

    R290X - amdttm kernel null pointer - Ubuntu 16.04 w/ amdgpu-pro-16.60-379184

    cedarlug

      I have a few 290 cards that are working solid with amdgpu-pro 16.60, but this one R290x card causes a kernel null pointer on load.  The following scenario works fine with a different R290 card, but not this particular card.  Here are the hardware details for the problematic card from lspci:

      08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X] (rev 80) (prog-if 00 [VGA controller])

              Subsystem: PC Partner Limited / Sapphire Technology Hawaii XT [Radeon R9 290X]

              Flags: bus master, fast devsel, latency 0, IRQ 34

              Memory at d0000000 (64-bit, prefetchable) [size=256M]

              Memory at cf800000 (64-bit, prefetchable) [size=8M]

              I/O ports at e000 [size=256]

              Memory at fbe80000 (32-bit, non-prefetchable) [size=256K]

              Expansion ROM at fbe60000 [disabled] [size=128K]

              Capabilities: [48] Vendor Specific Information: Len=08 <?>

              Capabilities: [50] Power Management version 3

              Capabilities: [58] Express Legacy Endpoint, MSI 00

              Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+

              Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>

              Capabilities: [150] Advanced Error Reporting

              Capabilities: [200] #15

              Capabilities: [270] #19

              Capabilities: [2b0] Address Translation Service (ATS)

      This card does work on Debian 8 (Jessie) using fglrx-15.302 on the same box when booted to a different partition (exactly the same hardware).

       

      Yet when booting a clean install of Ubuntu 16.04 with the amdgpu-pro v16.60 driver, the kernel oops' with:

       

      [   49.682026] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

      [   49.682081] IP: [<ffffffffc0117d90>] amdttm_pool_populate+0x110/0x5c0 [amdttm]

      [   49.682134] PGD 0

      [   49.682148] Oops: 0000 [#1] SMP

      [   49.682176] Modules linked in: bnep gpio_ich snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi coretemp snd_seq_midi_event kvm_intel snd_rawmidi snd_seq btusb kvm btrtl snd_seq_device btbcm snd_timer btintel bluetooth snd irqbypass soundcore joydev input_leds serio_raw lpc_ich ioatdma shpchp 8250_fintek i5500_temp i7core_edac ipmi_ssif dca edac_core ipmi_si ipmi_msghandler mac_hid parport_pc ppdev lp parport autofs4 uas usb_storage amdkfd amd_iommu_v2 hid_logitech ff_memless hid_generic amdgpu(OE) amdttm(OE) ast psmouse ttm amdkcl(OE) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt pata_acpi e1000e fb_sys_fops usbhid ptp hid drm pps_core fjes

      [   49.682752] CPU: 2 PID: 1247 Comm: Xorg Tainted: G          IOE   4.4.0-66-generic #87-Ubuntu

      [   49.682795] Hardware name: empty empty/S7002, BIOS 'V1.10' 05/03/2011

      [   49.682828] task: ffff88007ac02640 ti: ffff88007be14000 task.ti: ffff88007be14000

      [   49.682866] RIP: 0010:[<ffffffffc0117d90>]  [<ffffffffc0117d90>] amdttm_pool_populate+0x110/0x5c0 [amdttm]

      [   49.682922] RSP: 0018:ffff88007be17890  EFLAGS: 00010246

      [   49.682950] RAX: 00000000024280c0 RBX: 0000000000000000 RCX: ffff8801b70b0a80

      [   49.682986] RDX: 0000000000000001 RSI: 0000000000000040 RDI: 0000000000000090

      [   49.683021] RBP: ffff88007be17928 R08: ffff88007da9a120 R09: 0000000000000000

      [   49.683057] R10: ffff8801b70b0a00 R11: 0000000000000090 R12: ffff8801b6fc18c0

      [   49.683092] R13: ffff8801b70b0a00 R14: ffff88007be178d8 R15: 0000000000000000

      [   49.683128] FS:  00007f89d6d62a00(0000) GS:ffff88007da80000(0000) knlGS:0000000000000000

      [   49.683168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

      [   49.683197] CR2: 00000000000000a8 CR3: 00000001ba14f000 CR4: 00000000000006e0

      [   49.683232] Stack:

      [   49.683244]  ffff88007be178a0 ffffffff81191845 ffff8801b71d9080 000000017dad6e30

      [   49.683292]  0000000000000000 024280c000000040 ffff88007be178d0 0000000000000000

      [   49.683339]  0000000000000040 ffffffff811ee7d8 ffff88007d4032c0 ffffffffc011009b

      [   49.683386] Call Trace:

      [   49.683405]  [<ffffffff81191845>] ? mempool_alloc_slab+0x15/0x20

      [   49.683438]  [<ffffffff811ee7d8>] ? __kmalloc+0x208/0x250

      [   49.683470]  [<ffffffffc011009b>] ? amdttm_dma_tt_init+0x6b/0xd0 [amdttm]

      [   49.683566]  [<ffffffffc028217f>] amdgpu_ttm_tt_populate+0x6f/0x240 [amdgpu]

      [   49.683606]  [<ffffffffc010fae7>] amdttm_tt_bind+0x37/0x70 [amdttm]

      [   49.683642]  [<ffffffffc0111e40>] ttm_bo_handle_move_mem+0x530/0x5a0 [amdttm]

      [   49.683682]  [<ffffffffc0112d4a>] amdttm_bo_validate+0x13a/0x150 [amdttm]

      [   49.683721]  [<ffffffffc0112f89>] amdttm_bo_init+0x229/0x430 [amdttm]

      [   49.683781]  [<ffffffffc0285b07>] amdgpu_bo_create_restricted+0x217/0x530 [amdgpu]

      [   49.683846]  [<ffffffffc02852d0>] ? amdgpu_bo_gpu_offset+0x150/0x150 [amdgpu]

      [   49.683910]  [<ffffffffc02860cd>] amdgpu_bo_create+0xed/0x190 [amdgpu]

      [   49.683973]  [<ffffffffc028a3b3>] amdgpu_gem_object_create+0x103/0x1b0 [amdgpu]

      [   49.684039]  [<ffffffffc028a8dc>] amdgpu_gem_create_ioctl+0xac/0x1b0 [amdgpu]

      [   49.684105]  [<ffffffffc0043752>] drm_ioctl+0x152/0x540 [drm]

      [   49.684162]  [<ffffffffc028a830>] ? amdgpu_gem_object_close+0x120/0x120 [amdgpu]

      [   49.684204]  [<ffffffff8119fd07>] ? lru_cache_add_active_or_unevictable+0x27/0xa0

      [   49.684266]  [<ffffffffc027004c>] amdgpu_drm_ioctl+0x4c/0x80 [amdgpu]

      [   49.684302]  [<ffffffff81222b5f>] do_vfs_ioctl+0x29f/0x490

      [   49.684334]  [<ffffffff8106b514>] ? __do_page_fault+0x1b4/0x400

       

      Feedback appreciated.