cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

stsfred
Journeyman III

Ryzen 3900x nested virtualization issue (vmware, ubuntu, qemu)

I know it is a corner case, but the main reason for me to buy Ryzen platform was to support my (Cisco) networking studies. I need a lot of cores and a lot of RAM for this, let me explain my current situation:

Hardware: Ryzen 3900x, gigabyte x570 elite, 4x16G Gskill FlareX 2400MHz RAM (CL15). All stock, no overclocking, PBO disabled, latest BIOS installed (1.0.0.4)
Software: windows 10 home edition, every update installed, latest chipset drivers installed
Virtualization: every feature enabled in BIOS. Under win10, I use VMWare workstation player v15.

So I use a virtual appliance under VMW player, named EVE-NG (v2.0.3-105) community edition for network device virtualization (www.eve-ng.net). It is a free software built on Ubuntu.
Based on their guides, I successfully set up different Cisco network devices by installing their relevant images. Guides are here: https://www.eve-ng.net/index.php/documentation/howtos/

Now lets focus on the problematic router and OS version, namely: Cisco xrv9000 v6.5.1. Official info about this virtual appliance:
https://www.cisco.com/c/en/us/td/docs/routers/virtual-routers/xrv9k-65x/general/release/notes/b-rele...


Inside Eve-ng, this image is also running in a virtualized way, using Qemu v2.12. So this is nested virtualization scenario.
the problem is, that this appliance cannot boot up in 95% of the attempts, only about 5% of the attempts are successful.
eve-ng info:

    root@eve-ng:~# uname -a
    Linux eve-ng 4.20.17-eve-ng-ukms+ #2 SMP Wed Jun 5 08:18:06 CEST 2019 x86_64 x86_64 x86_64 GNU/Linux


Boot process stops after a random time (after 5 seconds or 30 seconds or 3 minutes) with messages like this:

  ################################################################################
    #                                                                              #
    #                  Welcome to the Cisco IOS XRv9k platform                     #
    #                                                                              #
    #    Please wait for Cisco IOS XR to start.                                    #
    #                                                                              #
    #    Copyright (c) 2014-2017 by Cisco Systems, Inc.                            #
    #                                                                              #
    ################################################################################


    Cisco IOS XR console     will start on the 1st serial port
    Cisco IOS XR aux console will start on the 2nd serial port
    Cisco Calvados console   will start on the 3rd serial port
    Cisco Calvados aux       will start on the 4th serial port

Text above shows normal boot process, then this happens:

    [   10.304380] BUG: unable to handle kernel paging request at ffffffff860449b1
    [   10.304380] IP: [<ffffffff860449b1>] kvm_unlock_kick+0x81/0x90
    [   10.304380] PGD 6a0f067 PUD 6a10063 PMD 60001e1
    [   10.304380] Oops: 0003 [#1] SMP
    [   10.304380] Modules linked in: tun bridge ip6table_filter ip6_tables iptable_filter ip_tables 80d
    [   10.304380] CPU: 1 PID: 4734 Comm: tee Tainted: G           O 3.14.23-WR7.0.0.2_standard #1
    [   10.304380] Hardware name: cisco Cisco IOS XRv 9000, BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu4
    [   10.304380] task: ffff88031c2d8110 ti: ffff8800ba5c0000 task.ti: ffff8800ba5c0000
    [   10.304380] RIP: 0010:[<ffffffff860449b1>]  [<ffffffff860449b1>] kvm_unlock_kick+0x81/0x90
    [   10.304380] RSP: 0018:ffff8800ba5c3d18  EFLAGS: 00010046
    [   10.304380] RAX: 0000000000000005 RBX: 0000000000000000 RCX: 0000000000000000
    [   10.304380] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff86f85a00
    [   10.304380] RBP: ffff8800ba5c3d30 R08: ffffffff86da4b00 R09: 0000000000000286
    [   10.304380] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff86f85a00
    [   10.304380] R13: 000000000000080c R14: ffff88031bc4104e R15: ffff88031c4ac000
    [   10.304380] FS:  00007fa68c1c7700(0000) GS:ffff88032dc80000(0000) knlGS:0000000000000000
    [   10.304380] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [   10.304380] CR2: ffffffff860449b1 CR3: 0000000037a89000 CR4: 00000000001406e0
    [   10.304380] Stack:
    [   10.304380]  0000000000000286 ffff88031c1ff800 0000000000000286 ffff8800ba5c3d48
    [   10.304380]  ffffffff8658ee3a ffffffff86f85a00 ffff8800ba5c3d70 ffffffff863670ed
    [   10.304380]  000000000000004e 0000000000000000 0000000000000000 ffff8800ba5c3dc8
    [   10.304380] Call Trace:
    [   10.304380]  [<ffffffff8658ee3a>] _raw_spin_unlock_irqrestore+0x5a/0x70
    [   10.304380]  [<ffffffff863670ed>] uart_start+0x3d/0x50
    [   10.304380]  [<ffffffff86367b7b>] uart_write+0xeb/0x120
    [   10.304380]  [<ffffffff8634ccfd>] n_tty_write+0x1ed/0x540
    [   10.304380]  [<ffffffff8608c830>] ? wake_up_process+0x50/0x50
    [   10.304380]  [<ffffffff86349344>] tty_write+0x174/0x2c0
    [   10.304380]  [<ffffffff8634cb10>] ? process_echoes+0x70/0x70
    [   10.304380]  [<ffffffff86349525>] redirected_tty_write+0x95/0xa0
    [   10.304380]  [<ffffffff861a33ea>] vfs_write+0xba/0x1e0
    [   10.304380]  [<ffffffff861a3e06>] SyS_write+0x46/0xc0
    [   10.304380]  [<ffffffff86597e49>] system_call_fastpath+0x16/0x1b
    [   10.304380] Code: 37 b1 86 48 8d 04 0b 48 8b 38 4c 39 e7 75 cb 0f b7 40 08 66 44 39 e8 75 c1 48
    [   10.304380] RIP  [<ffffffff860449b1>] kvm_unlock_kick+0x81/0x90
    [   10.304380]  RSP <ffff8800ba5c3d18>
    [   10.304380] CR2: ffffffff860449b1
    [   10.304380] ---[ end trace d3e7f193a2285065 ]---

after this, either boot process tops, or a restart happens.


I contacted the eve-ng team about this (they are very helpful), and I got a response that they do not support AMD CPUs at all because the behavior of the AMD cpus are not predictable in nested virtualization so they cannot guarantee anything.
Additionally, in their forums someone stated that the virtualization capabilities of Ryzens are not as good as the Intels VT-x and VT-d this is why there are issues.
I don't know if it is true or not I am not a virtualization expert.


also, I asked about this issue in a Cisco forum, but there is nothing, but silence. Cisco also has their own non-free eve-ng like solution, named VIRL. I had the exact same issue under VIRL. xrv9k couldn't boot up most of the times.
This is why I didn't extend my subscription last year.


So it seems like this symptom is not the issue of the virtulaization software (vmware/quemu/VIRL/eve-ng), but it must be something close to the hardware (CPU) level.


I googled some of the error messages, but couldn't find any solution, just references to old kernels or CPU defects.


I hope someone in AMD or in the AMD community can hint some workaround/solution for this issue.

0 Likes
15 Replies