AnsweredAssumed Answered

Vega Frontier driver failing to load in Ubuntu 16.04 VM

Question asked by ratbuddy on Jul 3, 2017

I'm running the following:

Ryzen 7 1700, Prime X370 Pro, 64GB RAM, Vega Frontier, Proxmox (most recent beta). I've passed the gpu through to an Ubuntu 16.04 guest OS. The guest has ROCm installed and running:

---

root@ubuntu:~# uname -r

4.9.0-kfd-compute-rocm-rel-1.6-77

---

The GPU shows up as follows:

---

root@ubuntu:~# lspci -nn | grep VGA

00:01.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8]

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:6863]

root@ubuntu:~# inxi -Gx

Graphics:  Card-1: Cirrus Logic GD 5446 bus-ID: 00:01.0

           Card-2: Advanced Micro Devices [AMD/ATI] Device 6863 bus-ID: 01:00.0

           Display Server: N/A driver: N/A tty size: 120x30 Advanced Data: N/A for root out of X

root@ubuntu:~# hwinfo --gfx

29: PCI 01.0: 0300 VGA compatible controller (VGA)

  [Created at pci.366]

  Unique ID: vSkL.v67VofT6db4

  SysFS ID: /devices/pci0000:00/0000:00:01.0

  SysFS BusID: 0000:00:01.0

  Hardware Class: graphics card

  Model: "Red Hat QEMU Virtual Machine"

  Vendor: pci 0x1013 "Cirrus Logic"

  Device: pci 0x00b8 "GD 5446"

  SubVendor: pci 0x1af4 "Red Hat, Inc"

  SubDevice: pci 0x1100 "QEMU Virtual Machine"

  Driver: "cirrus"

  Driver Modules: "drm"

  Memory Range: 0xf0000000-0xf1ffffff (ro,non-prefetchable)

  Memory Range: 0xfea14000-0xfea14fff (rw,non-prefetchable)

  Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)

  Module Alias: "pci:v00001013d000000B8sv00001AF4sd00001100bc03sc00i00"

  Driver Info #0:

    XFree86 v4 Server Module: cirrus

  Config Status: cfg=new, avail=yes, need=no, active=unknown

 

 

31: PCI 100.0: 0300 VGA compatible controller (VGA)

  [Created at pci.366]

  Unique ID: VCu0.l3K+wrIIbD4

  Parent ID: z8Q3.GJ8AwRBw1x3

  SysFS ID: /devices/pci0000:00/0000:00:1c.0/0000:01:00.0

  SysFS BusID: 0000:01:00.0

  Hardware Class: graphics card

  Model: "ATI VGA compatible controller"

  Vendor: pci 0x1002 "ATI Technologies Inc"

  Device: pci 0x6863

  SubVendor: pci 0x1002 "ATI Technologies Inc"

  SubDevice: pci 0x6b76

  Memory Range: 0xd0000000-0xdfffffff (ro,non-prefetchable)

  Memory Range: 0xe0000000-0xe01fffff (ro,non-prefetchable)

  I/O Ports: 0xd000-0xdfff (rw)

  Memory Range: 0xfe800000-0xfe87ffff (rw,non-prefetchable)

  Memory Range: 0xfe880000-0xfe89ffff (ro,non-prefetchable,disabled)

  IRQ: 16 (no events)

  Module Alias: "pci:v00001002d00006863sv00001002sd00006B76bc03sc00i00"

  Driver Info #0:

    Driver Status: amdgpu is active

    Driver Activation Cmd: "modprobe amdgpu"

  Config Status: cfg=new, avail=yes, need=no, active=unknown

  Attached to: #24 (PCI bridge)

 

 

Primary display adapter: #29

---

I get some errors as follows in dmesg:

---

root@ubuntu:~# dmesg | grep amdgpu

[    4.650021] [drm] amdgpu kernel modesetting enabled.

[    5.045078] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

[    5.074085] amdgpu 0000:01:00.0: VRAM: 16368M 0x000000F400000000 - 0x000000F7FEFFFFFF (16368M used)

[    5.074788] amdgpu 0000:01:00.0: GTT: 32093M 0x000000F7FF000000 - 0x000000FFD4DEFFFF

[    5.077230] [drm] amdgpu: 16368M of VRAM memory ready

[    5.078085] [drm] amdgpu: 32093M of GTT memory ready.

[    5.082329] amdgpu 0000:01:00.0: amdgpu: using MSI.

[    5.083591] [drm] amdgpu: irq initialized.

[    5.509514] amdgpu: [powerplay] amdgpu: powerplay sw initialized

[    5.511423] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x000000f7ff000008, cpu addr 0xffff9a4664549008

[    5.512479] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x000000f7ff000010, cpu addr 0xffff9a4664549010

[    5.513591] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x000000f7ff000018, cpu addr 0xffff9a4664549018

[    5.514621] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x000000f7ff000028, cpu addr 0xffff9a4664549028

[    5.515630] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x000000f7ff000030, cpu addr 0xffff9a4664549030

[    5.516614] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000f7ff000038, cpu addr 0xffff9a4664549038

[    5.517610] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x000000f7ff000048, cpu addr 0xffff9a4664549048

[    5.518587] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x000000f7ff000050, cpu addr 0xffff9a4664549050

[    5.519568] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x000000f7ff000058, cpu addr 0xffff9a4664549058

[    5.521610] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x000000f7ff000068, cpu addr 0xffff9a4664549068

[    5.523571] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x000000f7ff000070, cpu addr 0xffff9a4664549070

[    6.797156] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x000000f403f80600, cpu addr 0xffffa85ec295a600

[    6.798507] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x000000f7ff000098, cpu addr 0xffff9a4664549098

[    6.799650] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x000000f7ff0000b0, cpu addr 0xffff9a46645490b0

[    6.802787] amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x000000f7ff0000c8, cpu addr 0xffff9a46645490c8

[    6.803859] amdgpu 0000:01:00.0: fence driver on ring 15 use gpu addr 0x000000f7ff0000d8, cpu addr 0xffff9a46645490d8

[    6.804999] amdgpu 0000:01:00.0: fence driver on ring 16 use gpu addr 0x000000f7ff0000f0, cpu addr 0xffff9a46645490f0

[    6.928623] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed

[    6.929686] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22

[    6.930625] amdgpu 0000:01:00.0: amdgpu_init failed

[    8.073346] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) cirrus i2c_algo_bit ttm drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt ahci fb_sys_fops hid psmouse libahci drm

[    8.076263]  [<ffffffffc02b2f19>] amdgpu_gtt_mgr_fini+0x39/0x70 [amdgpu]

[    8.076263]  [<ffffffffc0293a4e>] amdgpu_ttm_fini+0xce/0x220 [amdgpu]

[    8.076263]  [<ffffffffc0295202>] amdgpu_bo_fini+0x12/0x40 [amdgpu]

[    8.076263]  [<ffffffffc02e3122>] gmc_v9_0_sw_fini+0x32/0x40 [amdgpu]

[    8.076263]  [<ffffffffc02809df>] amdgpu_fini+0x2af/0x460 [amdgpu]

[    8.076263]  [<ffffffffc02830e8>] amdgpu_device_init+0xf68/0x11b0 [amdgpu]

[    8.076263]  [<ffffffffc0286318>] ? amdgpu_driver_load_kms+0x28/0x230 [amdgpu]

[    8.076263]  [<ffffffffc028634d>] amdgpu_driver_load_kms+0x5d/0x230 [amdgpu]

[    8.076263]  [<ffffffffc027f49e>] amdgpu_pci_probe+0xbe/0xf0 [amdgpu]

[    8.076263]  [<ffffffffc04dc093>] amdgpu_init+0x93/0xa4 [amdgpu]

[    8.124288] [drm] amdgpu: ttm finalized

[    8.124876] amdgpu 0000:01:00.0: Fatal error during GPU init

[    8.125462] [drm] amdgpu: finishing device.

[    8.129037] amdgpu: probe of 0000:01:00.0 failed with error -22

---

Here's the actual problem I'm trying to fix:

---

root@ubuntu:/opt/rocm/hsa/sample# ./vector_copy

Initializing the hsa runtime succeeded.

Checking finalizer 1.0 extension support succeeded.

Generating function table for finalizer succeeded.

Getting a gpu agent failed.

---

 

I greatly appreciate any advice in helping to resolve this issue. Thanks!

Outcomes