AnsweredAssumed Answered

AMDGPU-PRO Kernel Panic on 16.04 Ubuntu with Kabylake based system

Question asked by jstefanop on Mar 3, 2017
Latest reply on Mar 20, 2017 by jstefanop

So base 16.04 server install with latest 16.60.3 AMDGPU-PRO drivers is causing a kernal panic on a kabylake based system when anything OpenCL wise is attempted to be accessed(in this case clinfo is called). Below is a kernel dump of the issue. Same install works fine on a Haswell based system. Clinfo returns fine when the amd gpu is taken out of the kabylake system (RX470 in this case), and returns the opencl info of the kabylake GPU...so the issue is definitely with the AMDGPU-PRO drivers when trying to access the AMD GPU.

 

 

[  106.745104] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8

[  106.745133] IP: [<ffffffffc0238d90>] amdttm_pool_populate+0x110/0x5c0 [amdttm]

[  106.745154] PGD 273a0b067 PUD 26e9a6067 PMD 0

[  106.745166] Oops: 0000 [#1] SMP

[  106.745176] Modules linked in: cfg80211 x86_pkg_temp_thermal coretemp kvm_intel ipmi_ssif kvm irqbypass snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds soundcore mei_me hci_uart mei btbcm btqca btintel bluetooth 8250_fintek ipmi_msghandler intel_lpss_acpi intel_lpss shpchp acpi_power_meter mac_hid acpi_als kfifo_buf industrialio ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear amdkfd amd_iommu_v2 hid_apple amdgpu(OE) amdttm(OE) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ast ablk_helper ttm cryptd

[  106.745371] amdkcl(OE) igb drm_kms_helper dca syscopyarea sysfillrect ptp sysimgblt fb_sys_fops pps_core i2c_algo_bit drm ahci usbhid libahci video i2c_hid pinctrl_sunrisepoint pinctrl_intel hid fjes

[  106.745421] CPU: 1 PID: 1367 Comm: clinfo Tainted: G           OE   4.4.0-64-generic #85-Ubuntu

[  106.745437] Hardware name: Supermicro Super Server/X11SSL(-F)/X11SSM, BIOS 2.0 01/06/2017

[  106.745452] task: ffff88026fc44b00 ti: ffff8802742cc000 task.ti: ffff8802742cc000

[  106.745465] RIP: 0010:[<ffffffffc0238d90>]  [<ffffffffc0238d90>] amdttm_pool_populate+0x110/0x5c0 [amdttm]

[  106.745486] RSP: 0018:ffff8802742cf890  EFLAGS: 00010246

[  106.745496] RAX: 00000000024280c0 RBX: 0000000000000000 RCX: ffff88027305ca00

[  106.745509] RDX: 0000000000000001 RSI: 0000000000000040 RDI: 0000000000000090

[  106.745521] RBP: ffff8802742cf928 R08: ffff88027fc9a160 R09: 0000000000000000

[  106.745534] R10: ffff88027305c800 R11: 0000000000000090 R12: ffff880273edf900

[  106.745547] R13: ffff88027305c800 R14: ffff8802742cf8d8 R15: 0000000000000000

[  106.745560] FS:  00007f5db1a4a740(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000

[  106.745574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  106.745585] CR2: 00000000000000a8 CR3: 0000000273e02000 CR4: 00000000003406e0

[  106.745598] Stack:

[  106.745602] ffff8802742cf960 ffffffff811ecb69 ffff88026fd63600 000000018010000e

[  106.745619] 0000000000000000 024280c000000040 0000000000000038 0000000000000000

[  106.745635] 0000000000000040 ffffffff811ee7d8 ffff880277001400 ffffffffc023109b

[  106.745652] Call Trace:

[  106.745660] [<ffffffff811ecb69>] ? ___slab_alloc+0x1e9/0x470

[  106.745672] [<ffffffff811ee7d8>] ? __kmalloc+0x208/0x250

[  106.745684] [<ffffffffc023109b>] ? amdttm_dma_tt_init+0x6b/0xd0 [amdttm]

[  106.745716] [<ffffffffc026717f>] amdgpu_ttm_tt_populate+0x6f/0x240 [amdgpu]

[  106.745731] [<ffffffffc0230ae7>] amdttm_tt_bind+0x37/0x70 [amdttm]

[  106.745744] [<ffffffffc0232e40>] ttm_bo_handle_move_mem+0x530/0x5a0 [amdttm]

[  106.745758] [<ffffffffc0233d4a>] amdttm_bo_validate+0x13a/0x150 [amdttm]

[  106.745772] [<ffffffffc0233f89>] amdttm_bo_init+0x229/0x430 [amdttm]

[  106.745798] [<ffffffffc026ab07>] amdgpu_bo_create_restricted+0x217/0x530 [amdgpu]

[  106.745821] [<ffffffffc026a2d0>] ? amdgpu_bo_gpu_offset+0x150/0x150 [amdgpu]

[  106.745845] [<ffffffffc026b0cd>] amdgpu_bo_create+0xed/0x190 [amdgpu]

[  106.745867] [<ffffffffc026f3b3>] amdgpu_gem_object_create+0x103/0x1b0 [amdgpu]

[  106.745891] [<ffffffffc026f8dc>] amdgpu_gem_create_ioctl+0xac/0x1b0 [amdgpu]

[  106.745911] [<ffffffffc009b752>] drm_ioctl+0x152/0x540 [drm]

[  106.745933] [<ffffffffc026f830>] ? amdgpu_gem_object_close+0x120/0x120 [amdgpu]

[  106.745948] [<ffffffff8119fd07>] ? lru_cache_add_active_or_unevictable+0x27/0xa0

[  106.746549] [<ffffffffc025504c>] amdgpu_drm_ioctl+0x4c/0x80 [amdgpu]

[  106.747140] [<ffffffff81222b5f>] do_vfs_ioctl+0x29f/0x490

[  106.747731] [<ffffffff8106b514>] ? __do_page_fault+0x1b4/0x400

[  106.748325] [<ffffffff81222dc9>] SyS_ioctl+0x79/0x90

[  106.748924] [<ffffffff8183c5f2>] entry_SYSCALL_64_fastpath+0x16/0x71

[  106.749515] Code: 01 19 c0 4e 8d 9c 3b 90 00 00 00 25 00 80 ff ff 05 c0 80 42 02 4d 85 db 89 45 94 0f 84 4f 02 00 00 49 01 df 4c 89 df 4c 89 4d 88 <41> 8b 87 a8 00 00 00 4c 89 5d 98 4c 89 75 b0 4c 89 75 b8 89 45

[  106.750794] RIP  [<ffffffffc0238d90>] amdttm_pool_populate+0x110/0x5c0 [amdttm]

[  106.751422] RSP <ffff8802742cf890>

[  106.752242] CR2: 00000000000000a8

[  106.752863] ---[ end trace 912d1e00331fc37d ]---

Outcomes