1 of 1 people found this helpful
I spoke with some other engineers. I hope this is useful. You might consider moving to a G34-based platform.
The Barcelona parts do not have an HT assist, while the Istanbul parts do. While the HT assist is on, the probe traffic is reduced significantly, so the coherent HT should be freed from a lot of the traffic. So, if you do small amounts of work or the system is idle, the traffic on the Barcelona system should be higher than the Istanbul.
One way to verify your results would be to do a remote bandwidth test, like stream. You can compare your bandwidth measurement on the coherent HT link with the stream result and they should roughly match.
Some things to consider with your bandwidth measurement:
- The counters on a coherent HT link only measure transmitted data, not both transmitted and received.
- The northbridge counters for a particular node are shared among all cores for that node, so you will probably want to separate the data by node and divide by the number of cores in that node.
- To get bi-directional bandwidth, you also need to consider the data that other nodes are transmitting.
- For NUMA node 0, there are also some split coherent HT links and the non-coherent HT link for the PCI devices.
You could apply a principium of conservation of data volume: what the memory controller is processing from memory banks is what is flowing through the coherent HT and being consumed by the cores. We recommend the Family 10h BKDG (for Barcelona and Instanbul) to you for more information about the performance events.
From memory, the maximum traffic through coherent HTlinks on Barcelona was around 2.5 GB/s bidirectional and would be around 5 GB/s on Instanbul. What results do you get from the stream benchmark?
I hope this helps,
Thank you very much for your answer. I agree with you, the traffic on the HT link must be lower on the Barcelona architecture than on the Istanbul one.
Actually the Barcelona architecture has consistent results (no traffic on the link with a CPU burn), whereas the Istanbul architecture always returns at least 50% of HT link usage...
I did the following experiment. I pinned the memory on node 1 and access it from node 0 (read access are performed by all cores on the node). It's a home-made benchmark, but stream gives very similar results.
* On the Barcelona architecture, I measured a throughput of 2.8GB/s. The link between node 0 and node 1 is used at 28% and the link between node 1 and node 0 is used at 99%. Other links are slightly used for the cache coherency protocol (less than 10%).
* On the Istanbul architecture, I measured a throughput of 4.2GB/s. The link between node 0 and node 1 is used at 73% and the link between node 1 and node 0 is used at 87%. Other links are used at 50% (but I cannot believe it, probably the same problem as the one observed with cpuburn).
Note that if I perform write accesses instead of read accesses, the link usages are inverted (i.e., the link between node 0 and node 1 is the most used), but the conclusions are the same.
Thanks for your help,