Hello,
I am building electronics equipment for scientists and am using Epyc server to process the data. Now I ran in a strange problem.
My device produces a continuous data stream of 12 GByte/sec. The data is transmitted by a PCIe Gen 3.0 x16 connection. It is then stored in a NVME drive array.
My simplistic calculation of DRAM bandwidth was like this:
My device writes to DRAM: 12 GByte/sec
The SSD reads from DRAM: 12 GByte/sec
Capacity of Epyc memory controller: 204 Gbyte/sec
-> usage 11.4 %, should be peace of cake
The problem is, that at high data rates the DMA transfer sometimes freezes for over 100 us.
100 microseconds may not be a lot for a typical server application. But in the real-time world it is a lot. It requires my device to buffer 1.2 MB of data, which it can't do.
A few remarks:
1.) The problem is not caused by the OS, my software, interrupts etc. It is the memory subsystem. My DMA controller writes data in a ring buffer without any CPU intervention. I am very sure about this fact.
2.) The CPU cores do not need a lot of DRAM bandwidth. The CPU load is 4 %. The CPU spends most of its time in NVME driver.
3.) Only writing to memory is fine. But as soon as I start accessing the disk at the same time, the error occurs. (Also with CrystalDiskMark in the background)
4.) I am using a Epyc 7352 CPU and 8 DIMMS with 16 Gbyte @3200 speed. My motherboard is a Supermicro MBD-H12SSL-NT.
This are my questions:
-----------------------
1.) Is there a known hickup in 7002 series that could cause this?
2.) Do you expect an improvement in 7003?
3.) Do I just expect too much of modern server hardware? Is a server CPU just not built for real time?
4.) Do you agree that the "Preferred IO" Bios setting should bring an improvement? (It doesn't)
5.) The "High Performance Computing: Tuning Guide for AMD EPYC™ 7002 Series Processors" says the following:
"Preferred IO allows one PCIe device in the system to be configured in a preferred state. This
device gets preferential treatment on the infinity fabric."
Since RAM and PCIe are located on the same die, the infinity fabric should not be involved here. Is my understanding wrong?
6.) Anything else I could try?
Many thanks for your answer!