joaulo

linux amdgpu-pro driver problem

Discussion created by joaulo on Sep 11, 2020

Hi everyone,
I bought my first AMD video card and I'm having a bit of trouble setting it up and getting it working properly.
I use Linux Debian and have just installed the Testing version (Bullseye) because the stable version (Buster) does not support all components of the motherboard (MSI X570 Tomahawk).
With Kernel 5.6 the motherboard works ok but the video card (Sapphire Nitro + RX5700XT) gives me some problems.

I am using the official Radeon 20.30 amdgpu-pro drivers.

The drivers install correctly without errors and programs that require OpenCL such as LibreOffice or Blender work correctly, however the system log reports errors of this type:

 

[    6.578936] amdgpu: failed send message: TransferTableSmu2Dram (18)  param: 0x00000006 response 0xffffffc2
[    6.578970] amdgpu: Failed to export SMU metrics table!
[    9.344000] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
[    9.344124] [drm:amdgpu_dpm_enable_uvd [amdgpu]] *ERROR* Dpm enable uvd failed, ret = -62.
[   10.392485] amdgpu 0000:2f:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc0 (-110).
[   11.416542] amdgpu 0000:2f:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc1 (-110).
[   12.112096] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
[   14.876439] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
[   14.876474] amdgpu: SMU11 attempt to set divider for DCEFCLK Failed!
[   17.641570] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
[   17.641720] [drm:jpeg_v2_0_set_powergating_state [amdgpu]] *ERROR* Dpm enable jpeg failed, ret = -62.
[   17.641834] [drm:process_one_work] *ERROR* ib ring test failed (-110).
[   17.644981] Bluetooth: BNEP (Ethernet Emulation) ver 1.3

 

After a few minutes or hours of work I have a performance drop and the log shows the following output:

 

[ 4302.219194] ------------[ cut here ]------------
[ 4302.219201] WARNING: CPU: 3 PID: 1006 at arch/x86/kernel/fpu/core.c:109 kernel_fpu_end+0x19/0x20
[ 4302.219201] Modules linked in: rfcomm cmac bnep binfmt_misc iwlmvm mac80211 edac_mce_amd kvm_amd libarc4 snd_hda_codec_realtek kvm snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi iwlwifi snd_hda_intel btusb irqbypass snd_intel_dspcfg fuse btrtl btbcm snd_hda_codec btintel nls_ascii bluetooth ghash_clmulni_intel nls_cp437 snd_hda_core cfg80211 vfat fat snd_hwdep efi_pstore snd_pcm_oss drbg snd_mixer_oss snd_pcm ansi_cprng aesni_intel ecdh_generic ecc snd_timer crypto_simd libaes sg sp5100_tco cryptd glue_helper rfkill efivars pcspkr wmi_bmof watchdog k10temp snd ccp soundcore rng_core joydev evdev acpi_cpufreq parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sd_mod amdgpu(OE) amd_sched(OE) amdttm(OE) amdkcl(OE) i2c_algo_bit drm_kms_helper cec ahci xhci_pci libahci xhci_hcd drm libata nvme crc32_pclmul r8125(OE) usbcore nvme_core crc32c_intel r8169 scsi_mod realtek i2c_piix4 t10_pi libphy crc_t10dif mfd_core
[ 4302.219240]  crct10dif_generic usb_common crct10dif_pclmul crct10dif_common wmi button
[ 4302.219245] CPU: 3 PID: 1006 Comm: Xorg Tainted: G           OE     5.6.0-0.bpo.2-amd64 #1 Debian 5.6.14-2~bpo10+1
[ 4302.219245] Hardware name: Micro-Star International Co., Ltd. MS-7C84/MAG X570 TOMAHAWK WIFI (MS-7C84), BIOS 1.00 04/11/2020
[ 4302.219248] RIP: 0010:kernel_fpu_end+0x19/0x20
[ 4302.219249] Code: c3 65 8a 05 c1 39 de 7c 83 f0 01 c3 0f 1f 44 00 00 0f 1f 44 00 00 65 8a 05 ac 39 de 7c 84 c0 74 09 65 c6 05 a0 39 de 7c 00 c3 <0f> 0b eb f3 0f 1f 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53
[ 4302.219250] RSP: 0018:ffffae1a809c3b30 EFLAGS: 00010246
[ 4302.219251] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000098b4
[ 4302.219252] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000030080
[ 4302.219253] RBP: ffff8ea726920000 R08: 0000000000000000 R09: 0000000000000000
[ 4302.219253] R10: ffffae1a809c3a80 R11: 0000000000000401 R12: ffff8ea48a560000
[ 4302.219254] R13: 0000000000000006 R14: ffff8ea48a561e98 R15: ffff8ea72bb6a800
[ 4302.219255] FS:  00007fc8c477c580(0000) GS:ffff8ea73e8c0000(0000) knlGS:0000000000000000
[ 4302.219256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4302.219257] CR2: 00007f1b52751000 CR3: 00000007f11ac000 CR4: 0000000000340ee0
[ 4302.219258] Call Trace:
[ 4302.219366]  dcn20_validate_bandwidth+0x2b/0x40 [amdgpu]
[ 4302.219462]  dc_validate_global_state+0x288/0x340 [amdgpu]
[ 4302.219561]  amdgpu_dm_atomic_check+0x910/0xe40 [amdgpu]
[ 4302.219583]  drm_atomic_check_only+0x566/0x7e0 [drm]
[ 4302.219601]  ? drm_mode_object_put.part.2+0x1f/0x50 [drm]
[ 4302.219616]  ? drm_atomic_set_property+0x82/0xa00 [drm]
[ 4302.219631]  drm_atomic_commit+0x13/0x50 [drm]
[ 4302.219646]  drm_mode_obj_set_property_ioctl+0x149/0x2e0 [drm]
[ 4302.219660]  ? drm_property_create_blob.part.3+0xd8/0x110 [drm]
[ 4302.219673]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[ 4302.219687]  drm_ioctl_kernel+0xac/0xf0 [drm]
[ 4302.219702]  drm_ioctl+0x201/0x3a0 [drm]
[ 4302.219715]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[ 4302.219808]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 4302.219814]  ksys_ioctl+0x86/0xc0
[ 4302.219817]  __x64_sys_ioctl+0x16/0x20
[ 4302.219819]  do_syscall_64+0x52/0x170
[ 4302.219823]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4302.219825] RIP: 0033:0x7fc8c4dbfd87
[ 4302.219827] Code: 00 00 00 48 8b 05 09 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 90 0c 00 f7 d8 64 89 01 48
[ 4302.219828] RSP: 002b:00007ffcfa3eb1e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4302.219830] RAX: ffffffffffffffda RBX: 00007ffcfa3eb220 RCX: 00007fc8c4dbfd87
[ 4302.219831] RDX: 00007ffcfa3eb220 RSI: 00000000c01864ba RDI: 000000000000000f
[ 4302.219831] RBP: 00000000c01864ba R08: 0000000000000073 R09: 00000000cccccccc
[ 4302.219832] R10: 0000000002000000 R11: 0000000000000246 R12: 0000000000000100
[ 4302.219833] R13: 000000000000000f R14: 00005645e9d3fe30 R15: 0000000000000100
[ 4302.219836] ---[ end trace a671be153db9989e ]---

 

The biggest problem is a crash when starting the Unreal Engine with error:

 

free(): invalid pointer
Signal 6 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=6
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=374880 LargeMemoryPoolOffset=506016
[2020.04.16-02.49.50:104][ 0]LogCore: === Critical error: ===
Unhandled Exception: SIGABRT: abort() called

[2020.04.16-02.49.50:104][ 0]LogCore: Fatal error!

0x00007f78177e6e97 libc.so.6!gsignal(+0xc7)
0x00007f78177e8801 libc.so.6!abort(+0x140)
0x00007f7817831897 libc.so.6!UnknownFunction(0x89896)
0x00007f781783890a libc.so.6!UnknownFunction(0x90909)
0x00007f781783fe1c libc.so.6!cfree(+0x4cb)
0x00007f77de5584af amdgpu_dri.so!UnknownFunction(0x13e84ae)
0x00007f77de7a8a0a amdgpu_dri.so!UnknownFunction(0x1638a09)
0x00007f77de55947b amdgpu_dri.so!UnknownFunction(0x13e947a)
0x00007f77de55a7e0 amdgpu_dri.so!UnknownFunction(0x13ea7df)
0x00007f77de585887 amdgpu_dri.so!UnknownFunction(0x1415886)
0x00007f77de5862e7 amdgpu_dri.so!UnknownFunction(0x14162e6)
0x00007f77de402cd8 amdgpu_dri.so!UnknownFunction(0x1292cd7)
0x00007f77dddc20fc amdgpu_dri.so!UnknownFunction(0xc520fb)
0x00007f77de59bc4c amdgpu_dri.so!UnknownFunction(0x142bc4b)
0x00007f77de06ce48 amdgpu_dri.so!UnknownFunction(0xefce47)
0x00007f77de429533 amdgpu_dri.so!UnknownFunction(0x12b9532)
0x00007f77de480a2e amdgpu_dri.so!UnknownFunction(0x1310a2d)
0x00007f77e1c1536f libGL.so.1!UnknownFunction(0x7b36e)
0x00007f77e1c266c9 libGL.so.1!UnknownFunction(0x8c6c8)
0x00007f77e1c1d2ab libGL.so.1!UnknownFunction(0x832aa)
0x00007f77e1be7323 libGL.so.1!glXChooseVisual(+0x52)
0x00007f781b5a644c libUE4Editor-ApplicationCore.so!X11_GL_GetVisual [/SDL-gui-backend/src/video/x11/SDL_x11opengl.c:606]
0x00007f781b5a6743 libUE4Editor-ApplicationCore.so!X11_GL_LoadLibrary [/SDL-gui-backend/src/video/x11/SDL_x11opengl.c:235]
0x00007f781b4f7506 libUE4Editor-ApplicationCore.so!SDL_CreateWindow_REAL [/SDL-gui-backend/src/video/SDL_video.c:1462]
0x00007f781b4f7041 libUE4Editor-ApplicationCore.so!SDL_VideoInit_REAL [/SDL-gui-backend/src/video/SDL_video.c:555]
0x00007f781b55313f libUE4Editor-ApplicationCore.so!SDL_Init_REAL [/SDL-gui-backend/src/SDL.c:255]
0x00007f781b48b34e libUE4Editor-ApplicationCore.so!FLinuxPlatformApplicationMisc::InitSDL() [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/ApplicationCore/Private/Linux/LinuxPlatformApplicationMisc.cpp:287]
0x00007f781b488bb6 libUE4Editor-ApplicationCore.so!FDisplayMetrics::RebuildDisplayMetrics(FDisplayMetrics&) [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/ApplicationCore/Private/Linux/LinuxApplication.cpp:1435]
0x00007f781d13736a libUE4Editor-Engine.so!UGameEngine::DetermineGameWindowResolution(int&, int&, EWindowMode::Type&, bool) [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/Engine/Private/GameEngine.cpp:326]
0x00007f781d1a58de libUE4Editor-Engine.so!UGameUserSettings::PreloadResolutionSettings() [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/Engine/Private/GameUserSettings.cpp:610]
0x0000000000244901 UE4Editor!FEngineLoop::PreInitPreStartupScreen(char16_t const*) [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/Launch/Private/LaunchEngineLoop.cpp:1963]
0x000000000023e2ec UE4Editor!GuardedMain(char16_t const*) [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/Launch/Private/Launch.cpp:131]
0x00007f7820fd0191 libUE4Editor-UnixCommonStartup.so!CommonUnixMain(int, char**, int (*)(char16_t const*), void (*)()) [/mnt/2fe95e40-f8ba-418e-804c-0a25571f7b0c/Unreal/EngineRepo/Engine/Source/Runtime/Unix/UnixCommonStartup/Private/UnixCommonStartup.cpp:264]
0x00007f78177c9b97 libc.so.6!__libc_start_main(+0xe6)
0x000000000022b029 UE4Editor!_start()

 

Removing the amdgpu-pro driver and reverting to the open source version of the driver starts the UInreal Engine, however OpenCL support is not available and programs such as Blender cannot use hardware acceleration for rendering.
Also, compiling the amdgpu-pro driver for the 5.7 kernel fails.

 

Is there anyone able to give me suggestions for having a properly functioning system with all hardware acceleration / OpenCL?

 

thank you

Outcomes