I have a question of identification of 4P AMD Opteron 6100 topology.
We have a HP Proliant DL585 G7 server,, which has four AMD Opteron 6136 processor, and came up to 8 dies/NUMA nodes. They suppose comply with Socket G34 topology. However, there are multiple possible topologies for this 4P/8-node connection, as referred by study (http://portal.nersc.gov/project/training/files/XE6-feb-2011/Architecture/Opteron-Memory-Cache.pdf).
We tried to use STREAM memory bandwidth benchmark to discover the topology of our machine. We assume that when CPU read from a remote memory node(1-hop/2-hop away), the bandwidth will be noticeable lower than local access. However, the result cannot converge to a unified topology, as shown in the attached plot.
My question is how to find out the exact topology of our host? Why memory access bandwidths are not matched with the CPU topology?
Any suggestion and help will be greatly appreciated.