We tried to use STREAM memory bandwidth benchmark to discover the topology of our machine. We assume that when CPU read from a remote memory node(1-hop/2-hop away), the bandwidth will be noticeable lower than local access. However, the result cannot converge to a unified topology, as shown in the attached plot.
My question is how to find out the exact topology of our host? Why memory access bandwidths are not matched with the CPU topology?
Any suggestion and help will be greatly appreciated.