I am doing an MMIO read on a 64-bit int in a memory region backed by a PCIe BAR. I am using an AMD Zen 3.
I know that by default, it is not possible to prefetch on a memory region marked UC or WC. However, it seems like it should be possible to prefetch on a memory region marked WT because WT memory uses the host cache hierarchy (though I would need to manage coherence on top in software, which is ok in this case). Thus, prefetches on it should work.
I tried mapping the MMIO region into kernel space with ioremap_wt() and did not see a caching effect. Specifically, I tried reading the same address in the MMIO region 50 times in a loop and saw an overhead that is 50x higher than the overhead of a single MMIO read. The 50x overhead clearly demonstrated 50 roundtrips across the bus. I would expect the overhead to be much lower than 50x because the int should be in the L1 cache after the first MMIO read.
I also tried marking the region WB (ioremap_cache()) and WP (ioremap_wp()) and saw the same 50x overhead in both cases.
Is it correct that Zen 3 ignores PAT attributes other than UC/WC for PCIe BARs? Is there any way to do prefetching on an MMIO region backed by a PCIe BAR?
Thanks for your help.