No Compromise: Driving Performance and Efficiency with AMD EPYC and SMT
One of the main pillars that vendors of Arm-based processors often cite as a competitive advantage versus x86 processors is a keen focus on energy efficiency and predictability of performance. In the quest for higher efficiency and performance, Arm vendors have largely designed out the ability to operate on multiple threads concurrently—something that most enterprise-class CPUs have enabled for years under the technology description of “SMT”—which was also created in the name of enabling performance and efficiency benefits.
Arm vendors often claim that SMT brings security risks, creates performance unpredictability from shared resource contention and drives added cost and energy needed to implement SMT. Interestingly, Arm does support multi-threading in its Neoverse E1-class processor family for embedded uses such as automotive. Given these incongruities, this blog intends to provide a bit more clarity to help customers assess what attributes of performance and efficiency really bring them value for their critical workloads.
What is SMT:
Simultaneous Multithreading (SMT) is a technology that allows a CPU core to execute multiple threads simultaneously. Since its inception, SMT has been implemented in many modern processors with varying numbers of threads. The most common approach is 2-way SMT, where two threads execute simultaneously per CPU core—versus each thread running to completion serially—as shown in Figure 1. This blog focuses on 2-way SMT, as implemented in AMD “Zen” processor cores.
Figure 1: Single threaded processing flow compared to SMT process flow.
Benefits of using SMT:
SMT is a popular CPU feature because it offers several performance and efficiency benefits:
SMT design challenges:
While SMT adds significant performance to a core, it also presents hardware design challenges for silicon and system vendors to address:
As shown in Figure 2, controls must be in place to meet these principles. To prevent starvation and ensure fairness, in-order queues are statically partitioned while parts of the out-of-order queues and branch prediction are watermarked for each thread, and the rest is competitively shared.
Figure 2: SMT-enabled resource sharing on the “Zen 5” core
How “expensive” is it to implement SMT?
From an end customer perspective, there is no material “cost” for utilizing SMT—it is a built-in function that most x86 customers can freely turn on or off. But in the very practical terms of semiconductor economics, anything that consumes transistor area on chip silicon or consumer energy when running represents a cost. And in terms of the cost to implement SMT, that cost is small, easily offset by the gains it enables. For example, consider that implementing Simultaneous Multithreading (SMT) takes less than 5% of the core area in the latest AMD “Zen 4” and “Zen 5” cores. This includes all the necessary logic to allow two threads to share the core’s resources. In easy “manager math”, SMT enables up to 384 threads while consuming less silicon area than 10 physical cores—that is strong ROI. Additionally, in cases where software is licensed based on the number of physical cores in the system, having the extra performance and capacity enabled by the availability of virtual cores/threads can enable significant cost savings! Now to dispel that pesky energy consumption myth.
SMT Enables Performance and Efficiency
AMD EPYC processors have established hundreds of performance and efficiency world records. These include workloads that benefit significantly from multithreading and SMT and a number that do not, such as a number of HPC and technical computing apps. Let’s suppose we want a separate broad-based assessment of where SMT brings value and how AMD delivers the goods. Independent testing house Phoronix has done perhaps the most complete and consistent analysis of the value of SMT. The latest test results for the “Zen 5” based AMD EPYC 9005 CPUs showed big performance gains on a broad set of tested workloads, including databases, cryptography, and compression workloads—as shown in chart 1.
Chart 1: SMT Performance gains on AMD EPYC 9005 Systems.
These results are not surprising given that an earlier Phoronix analysis of SMT using prior generation AMD EPYC 9754 platforms identified similar performance and power efficiency gains.3 For those interested in workloads outside of the domains summarized in this chart, note that this site will provide a rather comprehensive detailed analysis of the 170 diverse tests. You’ll find that while a few workloads in technical and high-performance computing do seem to prefer having exclusive use of all physical core resources, many workloads gain incremental performance with SMT enabled.
Importantly, when Phoronix tested 4th and 5th generation EPYC CPUs across a wide variety of workloads it also measured minimal to no difference in power consumption when SMT is enabled vs disabled.
“For workloads able to benefit from SMT, it's still a clear win with AMD EPYC 9005 processors. When looking at all of the CPU power consumption across 170+ benchmarks taking ~13 hours to complete, the data here shows no power consumption difference overall to having SMT enabled”
The significant SMT performance gains (often in the range of 30-50%) combined with virtually no or minimal change in power consumption means that energy efficiency is getting a boost—better performance per watt! SMT is a major contributor to energy efficiency on modern x86 superscalar CPUs such as AMD EPYC™, together with power management and dynamic frequency scaling. The following comments summarize the benefits:
“SMT enabled on the AMD EPYC 9575F on average led to just a 2 Watt increase to the CPU power consumption than when it was disabled.”
Why do we see efficiency? When a core is in normal operation state (C0) executing instructions, a thread stall while waiting on data doesn’t move it to a lower power state to save power, but having a second thread to fill in the gaps can make a big difference in performance. The increased instruction throughput may slightly increase power consumption, at the same time power efficiency improves much more.
AMD EPYC and SMT: Still delivering great value after all these years
Simultaneous multithreading was developed in time when core resources were quite precious—one, two or perhaps 4 cores per socket—and it was essential for customers to be able to squeeze as much processing out of them as possible. In an age where AMD EPYC processors offer up to 192 physical high-performance Zen 5 cores per socket, it may seem natural to ask if these resources are still quite so precious and if SMT still carries value. If you ask any IT manager struggling to balance incredible growth in demand for compute resources and budgets, you’ll likely hear a resounding “yes”.
While physical cores are now quite plentiful, they are also still quite valuable for there is often a LOT of work to be accomplished and a significant other driver of IT solutions cost —software license costs—are often tied to the number of physical cores in the host server! The typical IT shop needs to get the most out of every resource—and having the flexibility to gain incremental compute capacity and performance with as few hardware resources as possible can deliver a powerful ROI. SMT is a compelling option: allowing relatively “free” performance boost where it can add value, but also easily disabled where it does not.
References: