cancel
Showing results for 
Search instead for 
Did you mean: 

Drivers & Software

krjdev
Adept II

amdgpu-pro 17.40-492261: Maybe a BUG in function "_kcl_reservation_object_copy_fences" in module "amdkcl.ko"

I just installed the driver 17.40-492261 under openSUSE Leap 42.3. But i have a problem.

I get multiple messages like this:

[  900.642188] BUG: sleeping function called from invalid context at ../mm/slab.c:2852

[  900.642191] in_atomic(): 1, irqs_disabled(): 0, pid: 3100, name: firefox

[  900.642204] CPU: 3 PID: 3100 Comm: firefox Tainted: G           O     4.4.92-31-default #1

[  900.642205] Hardware name: System manufacturer System Product Name/P5QL/EPU, BIOS 0408    07/20/2009

[  900.642208]  0000000000000000 ffffffff8133a1b7 00000000014000c0 0000000000000030

[  900.642209]  ffffffff811f1226 0000000000000001 0000000000000001 0000000000000003

[  900.642211]  ffffffff810b9665 ffff8802092abc88 ffffffff014000c0 ffff88021ba42bc0

[  900.642211] Call Trace:

[  900.642225]  [<ffffffff81019f29>] dump_trace+0x59/0x320

[  900.642227]  [<ffffffff8101a2ea>] show_stack_log_lvl+0xfa/0x180

[  900.642229]  [<ffffffff8101b091>] show_stack+0x21/0x40

[  900.642232]  [<ffffffff8133a1b7>] dump_stack+0x5c/0x85

[  900.642235]  [<ffffffff811f1226>] __kmalloc+0x146/0x4e0

[  900.642244]  [<ffffffffa046d5cc>] _kcl_reservation_object_copy_fences+0x3c/0x1b0 [amdkcl]

[  900.642261]  [<ffffffffa05fd16d>] ttm_bo_release+0x1bd/0x370 [amdttm]

[  900.642357]  [<ffffffffa07c02b5>] amdgpu_bo_unref+0x25/0x40 [amdgpu]

[  900.642388]  [<ffffffffa07d7594>] amdgpu_vm_free_levels+0x74/0xb0 [amdgpu]

[  900.642419]  [<ffffffffa07d75b4>] amdgpu_vm_free_levels+0x94/0xb0 [amdgpu]

[  900.642448]  [<ffffffffa07dbf40>] amdgpu_vm_fini+0x200/0x300 [amdgpu]

[  900.642475]  [<ffffffffa07b1925>] amdgpu_driver_postclose_kms+0x125/0x1f0 [amdgpu]

[  900.642507]  [<ffffffffa03b5e8c>] drm_release+0x24c/0x4e0 [drm]

[  900.642511]  [<ffffffff81211ae0>] __fput+0xe0/0x210

[  900.642515]  [<ffffffff8109dcd2>] task_work_run+0x72/0xa0

[  900.642518]  [<ffffffff8108335f>] do_exit+0x2ef/0xb60

[  900.642521]  [<ffffffff81083c49>] do_group_exit+0x39/0xa0

[  900.642522]  [<ffffffff81083cc0>] SyS_exit_group+0x10/0x10

[  900.642526]  [<ffffffff816314b2>] entry_SYSCALL_64_fastpath+0x16/0x71

[  900.643940] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x16/0x71

[  900.643941] Leftover inexact backtrace:

I have attached the full dmesg output.

Has anybody the same problem?

0 Kudos
3 Replies
krjdev
Adept II

Re: amdgpu-pro 17.40-492261: Maybe a BUG in function "_kcl_reservation_object_copy_fences" in module "amdkcl.ko"

I have modified the driver source code. Changed the second parameter from the kmalloc call in the function

_kcl_reservation_object_copy_fences, file kcl_reservation.c under usr/src/amdgpu-17.40-492261/amd/amdkcl

from GFP_KERNEL to GFP_ATOMIC.

Now the issue is fixed for me.

I'm a newbie in kernel programming. But I think, this was the reason:

GFP_KERNEL isn’t always the right allocation flag to use; sometimes kmalloc

is called from outside a process’s context. This type of call can happen, for instance,

in interrupt handlers, tasklets, and kernel timers. In this case, the current

process should not be put to sleep, and the driver should use a flag of GFP_ATOMIC

instead

(Quote from Linux Device Drivers, Third Edition [LWN.net]​ Chapter 8 Allocating Memory)

Attached a patch which fixes this issue.

You can apply the patch with my description in the thread amdgpu-pro 16.60: building kernel module (amdgpu-pro-dkms) fails on openSUSE Leap 42.2.

Can anybody test this and give me some feedback?

Edited on 2017-12-06

Reason: Corrected two typos

unixguy2k18
Adept I

Re: amdgpu-pro 17.40-492261: Maybe a BUG in function "_kcl_reservation_object_copy_fences" in module "amdkcl.ko"

Thank you for the patch. I'm planning to use AMDGPU-PRO drivers for the first time with my laptop dGPU (AMD Radeon 520 GCN 1.0-1.1/SI). Do you know, if the SI and CIK (GCN 1.0,1.1,1.2) support is better in amdgpu-pro than the open source amdgpu (I'm using PADOKA ppa).

0 Kudos
krjdev
Adept II

Re: amdgpu-pro 17.40-492261: Maybe a BUG in function "_kcl_reservation_object_copy_fences" in module "amdkcl.ko"

Sorry, for the late response.

I cannot answer this question. Have a RX580 (Polaris20/GCN 4) currently. Had a R9 270X (GCN 1.0 - I think) before. But at that time (2016), there was no support for this card in the amdgpu/andgpu-pro driver.

0 Kudos