Showing results for 
Search instead for 
Did you mean: 


Journeyman III

AMD terrible performance with QEMU IVSHMEM

We are using QEMUS Inter-VM Shared Memory (IVSHMEM) technology on a AMD Ryzen 9 5900X processor.
From QEMU's specification: IVSHMEM is designed to share a memory region between multiple QEMU processes running different guests and the host. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR. On a Linux VM you will use a UIO driver to map the memory to the devices.



We bench-marked performance for this technology communicating a block of data of different sizes between two Linux machines; in both an AMD and an Intel processor. IVSHMEM allows to have an interruption so our basic test consists of writing a block of memory to the shared memory, send a interruption to the other VM, then the other VM copies the data out and notifies with another interruption.

For this test the performance on AMD degrades rapidly after the standard page size (4k). Check this plot with round-trip-times on both processors: Intel in Red, AMD in Blue. *Note: scale is logarithmic.


We have tried to understand these results without any luck. We have tried checking cache misses, using hugepages, pinning processes to the cache blocks near each other, we checked the dTLB, and performance profiles. Of course AMD-V and AMD I/O are active. Main memory and other components are similar. Intel performance even beats AMD's WS on Laptops and other low-performance processors. The graph results are almost identical for many intel processors, like:

  • 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
  • 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  • 12th Gen Intel(R) Core(TM) i7-12700K 
  • 10th Gen Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz

Is there any design or implementation difference that makes the AMD processors worse for this scenario? Are there any know issues with IOVA performance, or memory handling in general?


2 Replies

Try re-posting your question at AMD Forum's Develop and see where the Moderator places your thread at from here:

Adept I

How are you accessing the BAR? The generic uio module or did you write your own?

If you wrote your own, how did you map the BAR?

If you did not write your own, what is the path you are using to access the BAR?

Note we use IVSHMEM for Looking Glass and do not note any of the issues you describe here, moving gigabytes of data a second over the shared memory interface.