0 Replies Latest reply on Jul 4, 2017 1:11 AM by ratbuddy

    Vega Frontier driver failing to load in Ubuntu 16.04 VM

    ratbuddy

      I'm running the following:

      Ryzen 7 1700, Prime X370 Pro, 64GB RAM, Vega Frontier, Proxmox (most recent beta). I've passed the gpu through to an Ubuntu 16.04 guest OS. The guest has ROCm installed and running:

      ---

      root@ubuntu:~# uname -r

      4.9.0-kfd-compute-rocm-rel-1.6-77

      ---

      The GPU shows up as follows:

      ---

      root@ubuntu:~# lspci -nn | grep VGA

      00:01.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8]

      01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:6863]

      root@ubuntu:~# inxi -Gx

      Graphics:  Card-1: Cirrus Logic GD 5446 bus-ID: 00:01.0

                 Card-2: Advanced Micro Devices [AMD/ATI] Device 6863 bus-ID: 01:00.0

                 Display Server: N/A driver: N/A tty size: 120x30 Advanced Data: N/A for root out of X

      root@ubuntu:~# hwinfo --gfx

      29: PCI 01.0: 0300 VGA compatible controller (VGA)

        [Created at pci.366]

        Unique ID: vSkL.v67VofT6db4

        SysFS ID: /devices/pci0000:00/0000:00:01.0

        SysFS BusID: 0000:00:01.0

        Hardware Class: graphics card

        Model: "Red Hat QEMU Virtual Machine"

        Vendor: pci 0x1013 "Cirrus Logic"

        Device: pci 0x00b8 "GD 5446"

        SubVendor: pci 0x1af4 "Red Hat, Inc"

        SubDevice: pci 0x1100 "QEMU Virtual Machine"

        Driver: "cirrus"

        Driver Modules: "drm"

        Memory Range: 0xf0000000-0xf1ffffff (ro,non-prefetchable)

        Memory Range: 0xfea14000-0xfea14fff (rw,non-prefetchable)

        Memory Range: 0x000c0000-0x000dffff (rw,non-prefetchable,disabled)

        Module Alias: "pci:v00001013d000000B8sv00001AF4sd00001100bc03sc00i00"

        Driver Info #0:

          XFree86 v4 Server Module: cirrus

        Config Status: cfg=new, avail=yes, need=no, active=unknown

       

       

      31: PCI 100.0: 0300 VGA compatible controller (VGA)

        [Created at pci.366]

        Unique ID: VCu0.l3K+wrIIbD4

        Parent ID: z8Q3.GJ8AwRBw1x3

        SysFS ID: /devices/pci0000:00/0000:00:1c.0/0000:01:00.0

        SysFS BusID: 0000:01:00.0

        Hardware Class: graphics card

        Model: "ATI VGA compatible controller"

        Vendor: pci 0x1002 "ATI Technologies Inc"

        Device: pci 0x6863

        SubVendor: pci 0x1002 "ATI Technologies Inc"

        SubDevice: pci 0x6b76

        Memory Range: 0xd0000000-0xdfffffff (ro,non-prefetchable)

        Memory Range: 0xe0000000-0xe01fffff (ro,non-prefetchable)

        I/O Ports: 0xd000-0xdfff (rw)

        Memory Range: 0xfe800000-0xfe87ffff (rw,non-prefetchable)

        Memory Range: 0xfe880000-0xfe89ffff (ro,non-prefetchable,disabled)

        IRQ: 16 (no events)

        Module Alias: "pci:v00001002d00006863sv00001002sd00006B76bc03sc00i00"

        Driver Info #0:

          Driver Status: amdgpu is active

          Driver Activation Cmd: "modprobe amdgpu"

        Config Status: cfg=new, avail=yes, need=no, active=unknown

        Attached to: #24 (PCI bridge)

       

       

      Primary display adapter: #29

      ---

      I get some errors as follows in dmesg:

      ---

      root@ubuntu:~# dmesg | grep amdgpu

      [    4.650021] [drm] amdgpu kernel modesetting enabled.

      [    5.045078] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff

      [    5.074085] amdgpu 0000:01:00.0: VRAM: 16368M 0x000000F400000000 - 0x000000F7FEFFFFFF (16368M used)

      [    5.074788] amdgpu 0000:01:00.0: GTT: 32093M 0x000000F7FF000000 - 0x000000FFD4DEFFFF

      [    5.077230] [drm] amdgpu: 16368M of VRAM memory ready

      [    5.078085] [drm] amdgpu: 32093M of GTT memory ready.

      [    5.082329] amdgpu 0000:01:00.0: amdgpu: using MSI.

      [    5.083591] [drm] amdgpu: irq initialized.

      [    5.509514] amdgpu: [powerplay] amdgpu: powerplay sw initialized

      [    5.511423] amdgpu 0000:01:00.0: fence driver on ring 0 use gpu addr 0x000000f7ff000008, cpu addr 0xffff9a4664549008

      [    5.512479] amdgpu 0000:01:00.0: fence driver on ring 1 use gpu addr 0x000000f7ff000010, cpu addr 0xffff9a4664549010

      [    5.513591] amdgpu 0000:01:00.0: fence driver on ring 2 use gpu addr 0x000000f7ff000018, cpu addr 0xffff9a4664549018

      [    5.514621] amdgpu 0000:01:00.0: fence driver on ring 3 use gpu addr 0x000000f7ff000028, cpu addr 0xffff9a4664549028

      [    5.515630] amdgpu 0000:01:00.0: fence driver on ring 4 use gpu addr 0x000000f7ff000030, cpu addr 0xffff9a4664549030

      [    5.516614] amdgpu 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000f7ff000038, cpu addr 0xffff9a4664549038

      [    5.517610] amdgpu 0000:01:00.0: fence driver on ring 6 use gpu addr 0x000000f7ff000048, cpu addr 0xffff9a4664549048

      [    5.518587] amdgpu 0000:01:00.0: fence driver on ring 7 use gpu addr 0x000000f7ff000050, cpu addr 0xffff9a4664549050

      [    5.519568] amdgpu 0000:01:00.0: fence driver on ring 8 use gpu addr 0x000000f7ff000058, cpu addr 0xffff9a4664549058

      [    5.521610] amdgpu 0000:01:00.0: fence driver on ring 9 use gpu addr 0x000000f7ff000068, cpu addr 0xffff9a4664549068

      [    5.523571] amdgpu 0000:01:00.0: fence driver on ring 10 use gpu addr 0x000000f7ff000070, cpu addr 0xffff9a4664549070

      [    6.797156] amdgpu 0000:01:00.0: fence driver on ring 11 use gpu addr 0x000000f403f80600, cpu addr 0xffffa85ec295a600

      [    6.798507] amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x000000f7ff000098, cpu addr 0xffff9a4664549098

      [    6.799650] amdgpu 0000:01:00.0: fence driver on ring 13 use gpu addr 0x000000f7ff0000b0, cpu addr 0xffff9a46645490b0

      [    6.802787] amdgpu 0000:01:00.0: fence driver on ring 14 use gpu addr 0x000000f7ff0000c8, cpu addr 0xffff9a46645490c8

      [    6.803859] amdgpu 0000:01:00.0: fence driver on ring 15 use gpu addr 0x000000f7ff0000d8, cpu addr 0xffff9a46645490d8

      [    6.804999] amdgpu 0000:01:00.0: fence driver on ring 16 use gpu addr 0x000000f7ff0000f0, cpu addr 0xffff9a46645490f0

      [    6.928623] [drm:psp_hw_init [amdgpu]] *ERROR* PSP firmware loading failed

      [    6.929686] [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <psp> failed -22

      [    6.930625] amdgpu 0000:01:00.0: amdgpu_init failed

      [    8.073346] Modules linked in: amdkfd amd_iommu_v2 amdgpu(+) cirrus i2c_algo_bit ttm drm_kms_helper hid_generic syscopyarea sysfillrect usbhid sysimgblt ahci fb_sys_fops hid psmouse libahci drm

      [    8.076263]  [<ffffffffc02b2f19>] amdgpu_gtt_mgr_fini+0x39/0x70 [amdgpu]

      [    8.076263]  [<ffffffffc0293a4e>] amdgpu_ttm_fini+0xce/0x220 [amdgpu]

      [    8.076263]  [<ffffffffc0295202>] amdgpu_bo_fini+0x12/0x40 [amdgpu]

      [    8.076263]  [<ffffffffc02e3122>] gmc_v9_0_sw_fini+0x32/0x40 [amdgpu]

      [    8.076263]  [<ffffffffc02809df>] amdgpu_fini+0x2af/0x460 [amdgpu]

      [    8.076263]  [<ffffffffc02830e8>] amdgpu_device_init+0xf68/0x11b0 [amdgpu]

      [    8.076263]  [<ffffffffc0286318>] ? amdgpu_driver_load_kms+0x28/0x230 [amdgpu]

      [    8.076263]  [<ffffffffc028634d>] amdgpu_driver_load_kms+0x5d/0x230 [amdgpu]

      [    8.076263]  [<ffffffffc027f49e>] amdgpu_pci_probe+0xbe/0xf0 [amdgpu]

      [    8.076263]  [<ffffffffc04dc093>] amdgpu_init+0x93/0xa4 [amdgpu]

      [    8.124288] [drm] amdgpu: ttm finalized

      [    8.124876] amdgpu 0000:01:00.0: Fatal error during GPU init

      [    8.125462] [drm] amdgpu: finishing device.

      [    8.129037] amdgpu: probe of 0000:01:00.0 failed with error -22

      ---

      Here's the actual problem I'm trying to fix:

      ---

      root@ubuntu:/opt/rocm/hsa/sample# ./vector_copy

      Initializing the hsa runtime succeeded.

      Checking finalizer 1.0 extension support succeeded.

      Generating function table for finalizer succeeded.

      Getting a gpu agent failed.

      ---

       

      I greatly appreciate any advice in helping to resolve this issue. Thanks!