cancel
Showing results for 
Search instead for 
Did you mean: 

Server Gurus Discussions

ObKo
Journeyman III

Sporadic PCIe latency increase on Epyc Rome

Hello, we have faced issue with PCIe on AMD Epyc 7002 platform.

Our platform is SuperMicro H11DSi-NT motherboard, 2x Epyc 7272, 16x DDR SODIMMs (all channels populated), Windows 10

We're developing own FPGA-based PCIe device for realtime processing, it has PCIe Scatter-Gather DMA controller with bus mastering.

We've noticed that on Epyc platform PCIe read latency (Non-posted memory read requests from device) is 3us-10us typically, but sometimes (like ~10-20 times per seconds) it increases to 180-200us. Our device has data buffer only for ~190us, so data is lost in that cases.

We've tried to:

  • Change PCIe slot
  • Bind all memory allocation in software to NUMA node with PCIe device
  • Update UEFI Firmware to latest
  • Enable "Perfomance" power mode in OS
  • Play with "NUMA nodes per socket" bios option
  • Enable "Perfomance" determinism slider in UEFI
  • Disable SMT, C-States, DF States in UEFI
  • APBDIS =1, SoC P-state = P0 in UEFI
  • Disable BMC VGA (Aspeed 2500)

Nothing changed, same behavior. Moreover, we've tested another (completely different) device under Linux and saw same numbers - 180us of read latency several times per second. AMDuProfPcm shows that all data is passing through single CPU, so NUMA memory binding is correct.

Next step was removing one of CPUs - same issue. But after setting APBDIS =1, SoC P-state = P0 in UEFI issue is gone with single CPU - stable 5-10us of latency. Inserting second CPU returns issue immediately. Looks like some inter-socket communication or power managing issue. Problem is that we need two CPUs for software in our project, so we can't just throw out second CPU.

Does anyone know anything about such behavior?

0 Likes
0 Replies