- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
AMD terrible performance with QEMU IVSHMEM
FYI - I'm not the original author of the below content. I've repost in this thread from another, following the suggestion from original thread since the original author did not repost it. I'm having very same situation as the below. Original link: https://community.amd.com/t5/processors/amd-terrible-performance-with-qemu-ivshmem/m-p/516291#M46130
We are using QEMUS Inter-VM Shared Memory (IVSHMEM) technology on a AMD Ryzen 9 5900X processor.
From QEMU's specification: IVSHMEM is designed to share a memory region between multiple QEMU processes running different guests and the host. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR. On a Linux VM you will use a UIO driver to map the memory to the devices.
We bench-marked performance for this technology communicating a block of data of different sizes between two Linux machines; in both an AMD and an Intel processor. IVSHMEM allows to have an interruption so our basic test consists of writing a block of memory to the shared memory, send a interruption to the other VM, then the other VM copies the data out and notifies with another interruption.
For this test the performance on AMD degrades rapidly after the standard page size (4k). Check this plot with round-trip-times on both processors: Intel in Red, AMD in Blue. *Note: scale is logarithmic.
We have tried to understand these results without any luck. We have tried checking cache misses, using hugepages, pinning processes to the cache blocks near each other, we checked the dTLB, and performance profiles. Of course AMD-V and AMD I/O are active. Main memory and other components are similar. Intel performance even beats AMD's WS on Laptops and other low-performance processors. The graph results are almost identical for many intel processors, like:
- 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
- 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
- 12th Gen Intel(R) Core(TM) i7-12700K
- 10th Gen Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Is there any design or implementation difference that makes the AMD processors worse for this scenario? Are there any know issues with IOVA performance, or memory handling in general?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please share how you mapped the BAR?
Ie, how did you specify `kdev->uio.mem[0].internal_addr`?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The reply with the code keeps being moderated.
https://www.codepile.net/pile/wWGwx8WX -> code link
Also, open(/sys/bus/pci/devices/.../resource2, ...) + mmap(..., 0) shows same performance.
I don't use UIO driver for IVSHMEM-plain BAR region allocation, although I don't know which method the original author used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since the moderation process is so slow here, I will just post the reddit link... IVSHMEM is very slow on Ryzen 5900X (and possibly more AMD ones) system. : VFIO (reddit.com)
