AMD EPYC™ 7003 Series Processors Deliver Outstanding Technical Computing Performance

raghu_nambiar · ‎06-16-2022

This blog is the latest in my series on 3rd Gen AMD EPYC™ processors and their significant performance uplifts for technical computing(1) workloads in both the cloud and datacenter. It also discusses the latest AMD EPYC 7003 Series Processors with AMD 3D V-Cache™ technology and highlights the new level of performance delivered by tripling the amount of L3 cache from a max of 256MB to 768MB per CPU and from 32MB to up to 96MB per core, when compared to standard 3rd Gen AMD EPYC processors.

Performance uplifts in technical workloads can translate into real-world benefits such as:

Performing more work in a given amount of time, thereby boosting productivity.
Performing more iterations in a given amount of time, thereby boosting accuracy.
Allowing you to use fewer cloud instances, thereby reducing usage costs.
Allowing you to purchase and maintain fewer servers, thereby reducing both upfront and ongoing costs while maintaining your current productivity.

Let’s begin by discussing the advantages in a cloud environment and then move into the datacenter.

Maximizing Technical Computing Performance and Flexibility in the Cloud

Many enterprises opt to deploy workloads on public clouds because they offer rapid scaling and pay-as-you-go pricing that can eliminate the need to build and maintain an on-premises datacenter. Public clouds can also deliver rapid performance scaling when needed most. 3rd Gen AMD EPYC processors deliver solid performance uplifts across multiple cloud providers and instance types, including Google C2D. Figure 1 shows the relative performance of Google Cloud C2D vs. prior N2D instances; please visit Introducing Compute Optimized VMs Powered by AMD EPYC Processors for more information, including the performance uplifts shown below.

Figure 1: Selected AMD EPYC generational performance uplifts on Google Cloud instances(2)

Amazon testing shows that Amazon EC2 Hpc6a instances display solid scaling and cost uplifts compared to c5n instances using the Siemens® Simcenter Computational Fluid Dynamics (CFD) application. These test results demonstrate that Hpc6a instances provide near-linear scale-out performance and even slight super-linear scaling(3) at scales up to 400 nodes and approximately 70% lower cost compared to c5n instances.(4) Microsoft® Azure® HBv3 instances now feature 3rd Gen AMD EPYC processors with AMD 3D V-Cache technology; more on this below. You can find out more about Amazon EC2 Hpc6a instances here.

Remember that performance uplifts can translate into higher productivity, better designs, and/or cost savings from using fewer instances. The inherent flexibility offered by cloud service providers allows you to tailor the number and types of instances in real-time to best suit your immediate needs without sacrificing your longer-term goals. The tremendous performance of 3rd Gen AMD EPYC processors helps you get the most for your IT budget.

Getting the Most from Your Datacenter Technical Computing Investment

3rd Gen AMD EPYC processors shine in a variety of technical computing workloads. Figures 2, 3, and 4 show 32-core AMD EPYC 7543 and 75F3 processors outperforming the competition on selected technical computing workloads that include Finite Element Analysis (FEA) and CFD applications. These are apples-to-apples comparisons because all of the compared processors have 32 cores.(5)

Figure 2: Sample Altair® Radioss® datacenter performance uplifts(6)

Figure 3: Sample Ansys® LS-DYNA® datacenter performance uplifts(7)

Figure 4: Sample Ansys® CFX® performance uplifts(8)

You can find additional information about each of these workloads in the following technical briefs:

We are also proud to partner with leading seismic oil and gas exploration applications, such as Emerson® Echos™. Seismic energy exploration uses pulses (called shots) fired into the ground with the return pulses captured and then processed.

The significant (and, in many cases, world record) performance uplifts offered by 3rd Gen AMD EPYC processors matter because datacenters represent a sizable investment in fixed—and finite—hardware resources and the supporting infrastructure (space, power, access control, cooling, disaster recovery, etc.) required to bring up and maintain that hardware. Unlike cloud deployments, one cannot easily scale up or scale down a datacenter and must therefore accurately plan current and future needs so as to gain the most value from these hard assets. 3rd Gen AMD EPYC processors are available in a wide array of core counts and frequencies to help optimize your datacenter infrastructure for workloads from both the performance and licensing viewpoints.

Supersizing the L3 Cache Supersizes Performance

AMD 3D V-Cache die-stacking technology may further enhance the technical computing performance uplifts provided by 3rd Gen AMD EPYC processors compared to both 2nd Gen AMD EPYC processors and the competition. This can be especially true for workloads that are memory bandwidth bound.

AMD 3D V-Cache effectively triples the amount of cache offered by standard 3rd Gen AMD EPYC processors. Providing that much cache to memory intensive applications can significantly increase the cache hit-to-miss ratio and allows more working memory to fit directly into the cache instead of falling into the slower main memory. Combining AMD 3D V-Cache with the core and memory advantages of 3rd Gen EPYC processors can significantly improve the performance of many technical computing applications.

AMD 3D-Cache in the Cloud

AMD 3D V-Cache technology is already proving its value in the cloud. Microsoft Azure recently launched HBv3 instances powered by 3rd Gen AMD EPYC processors with AMD 3D V-Cache technology. These new instances can offer significantly higher performance than both prior HBv3 instances powered by 3rd Gen AMD EPYC processors without AMD 3D V-Cache technology and HBv2 instances powered by 2nd Gen AMD EPYC processors.

Figures 5, 6, 7, and 8 show the relative 64-node scaling performance of these instances running both CFD and weather forecasting applications.

Figure 5: Relative Converge CFD™ performance of Microsoft Azure instances(9)

Figure 6: Relative Ansys® Fluent® performance of Microsoft Azure instances(10)

Figure 7: Relative Simcenter STAR-CCM+™ performance of Microsoft Azure instances(11)

Figure 8: Relative WRF® performance of Microsoft Azure instances(12)

You can find additional information about each of these workloads in the following technical briefs:

Converge™ CFD (CFD)
Ansys® Fluent® (CFD)
Simcenter STAR-CCM+™ (CFD)
WRF® (weather forecasting)

These briefs also include scaling performance results with different node counts.

In all cases, the HBv3 instances with AMD 3D V-Cache technology provide super-linear scaling, giving you multiple additional instances worth of computing power. For example, the super-linear scaling performance of HBv3 instances without AMD 3D V-Cache technology deliver an additional 11 nodes worth of computing power running Ansys Fluent when scaled to 64 nodes. However, when the same workload is scaled to the same node count on HBv3 instances running AMD 3D V-Cache technology, 64 nodes of compute deliver an incredible 127 nodes of performance – effectively delivering 63 additional instances of compute for free, or 52 nodes of compute for free when compared to the HBV3 instances without AMD 3D V-Cache technology.

Here again, performance uplifts can translate into the real-world benefits I described above. Instances with AMD 3D V-Cache technology can deliver all of the flexibility and scalability of the cloud and may allow you to run fewer instances, thereby helping you get the most from your finite IT budget.

AMD 3D V-Cache in the Datacenter

AMD 3D V-Cache technology also delivers superb performance uplifts to the datacenter. Figures 9, 10, and 11 show several single-node, dual-socket systems powered by 3rd Gen AMD EPYC processors with AMD 3D V-Cache technology: 7373X (16 cores), 7473X (24 cores), 7573X (32 cores) and 7773X (64 cores).

All comparisons show the AMD EPYC processors outperforming dual-socket systems powered by 32-core Intel® Xeon® Platinum 8362 processors across multiple CFD applications and weather forecasting. AMD EPYC processors may even outperform the competition with fewer cores. For example, a system powered by dual 16-core AMD EPYC 7373X processors has only half the cores of the Intel Xeon Platinum system but still delivers approximately 25% higher performance running Ansys CFX.

Figure 9: Sample Altair® AcuSolve® single-node AMD 3D V-Cache performance uplifts(13)

Figure 10: Sample Ansys® Fluent® single-node AMD 3D V-Cache performance uplifts(14)

Figure 11: Sample Ansys® CFX® single-node AMD 3D V-Cache performance uplifts(15)

You can find additional information about each of these workloads in the following technical briefs:

Single-node performance is only part of the exciting AMD 3D V-Cache story. Adding multiple compute nodes reduces the amount of processing performed by each node and, at a certain point, allows each portion of the dataset to fit entirely within the L3 cache in each node. The resulting performance boost can achieve a strong super-linear scaling effect. Figures 12, 13, and 14 show some examples of the super-linear scaling achieved by 3rd Gen AMD EPYC processors with AMD 3D V-Cache technology on 8-node clusters compared to scaling of standard 3rd Gen AMD EPYC processors relative to simple linear scaling.

Figure 12: Sample Altair® AcuSolve® AMD 3D V-Cache super-linear scaling performance uplift(16)

Figure 13: Sample Ansys® Fluent® AMD 3D V-Cache super-linear scaling performance uplift(17)

Figure 14: Sample OpenFOAM® AMD 3D V-Cache super-linear scaling performance uplift(18)

You can find additional information about each of these workloads in the following technical briefs:

Getting the most from AMD 3D V-Cache technology requires scaling to enough nodes to allow a higher percentage of the overall dataset to fit into processor cache, but the performance uplifts may lead to significant cost savings because they may allow you to purchase fewer servers.

Beyond the Hardware

Great hardware is just the beginning. AMD is also proud to offer both the AMD Optimizing C/C++ and Fortran Compilers (AOCC) production compilers and the AMD Optimizing CPU Libraries (AOCL) numerical libraries. These software stacks allow developers to further leverage the performance of AMD EPYC processors and have already been integrated across various open-source HPC application verticals that include weather, geoscience, life science, finance, manufacturing, and CFD. They include best-performance Spack recipes to facilitate ease of deployment. AOCL is integrated into Ansys® Mechanical, Mathworks®, and Comsol®. Additional integrations across both Independent Software Vendor (ISV) and open-source applications are ongoing.

Why Choose 3rd Gen AMD EPYC Processors?

The Frontier supercomputer powered by AMD EPYC processors and AMD Instinct™ accelerators is the fastest computer in the world today, according to the Top500 list from June 1, 2022. Frontier is scheduled go live and begin solving the world’s biggest scientific challenges later this year. Simpler challenges in the cloud and the datacenter also benefit from AMD EPYC performance that can translate into real world benefits and give you the flexibility to decide the combination of benefits that best suits your needs.

The number of workloads that benefit from the incredible performance delivered by AMD EPYC processors with and without AMD 3D V-Cache technology. I encourage you to read the ever-expanding library of AMD and curated third-party Performance Briefs, Solution Briefs, and other technical documentation to learn in detail how 3rd Gen AMD EPYC processors can help you achieve your essential current and future business needs, and quite possibly some aspirational goals as well.

Raghu Nambiar is a Corporate Vice President of Data Center Ecosystems and Solutions for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

FOOTNOTES

“Technical Computing” or “Technical Computing Workloads” as defined by AMD can include: electronic design automation, computational fluid dynamics, finite element analysis, seismic tomography, weather forecasting, quantum mechanics, climate research, molecular modeling, or similar workloads. GD-204
See https://cloud.google.com/blog/products/compute/introducing-compute-optimized-vms-on-amd-epyc-milan.
AMD defines “linear scaling” as an equal and proportionate application performance uplift relative to single node performance; that is, when scaling out to 2 nodes results in 2x the performance of a single node, scaling out to 4 nodes results in 4x the performance of a single node, and so forth. “Super-linear” scaling is when the performance uplift achieved by adding one or more node(s) is greater than linear. AMD allows a +/- of 2% margin of error when claiming linear or super linear scaling. GD-205
Please see https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc6a-instance-optimized-for-high-performance-comput....
All three of these tests compare the 8-node super-linear scaling performance of the 64-core AMD EPYC 7773X processor with AMD 3D V-Cache technology against the standard 64-core AMD EPYC 7763 processor, where single-node performance always equals 1.00x and where 8-node linear scaling performance always equals 8.00x. The number of additional compute nodes worth of performance is derived by subtracting the 8.00x linear scaling performance uplift from both the rounded AMD EPYC 7773X and rounded AMD EPYC 7763 performance, and then subtracting the AMD EPYC 7763 performance from the AND EPYC 7773X performance.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-pb-altair-radioss-icelake-performance-compa...; see Figure 2.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-pb-ansys-lsdyna-icelake.pdf; see Figure 2.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-sb-ansys-cfx-icelake.pdf; see Figure 2.
Please see Figure 2 in https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-azure-hbv3-converge-cfd.pdf; see Figure 2.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-azure-hbv3-ansys-fluent.pdf; see Figure 2.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-azure-hbv3-siemens-star-ccm.pd...; see Figure 2.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-azure-hbv3-wrf.pdf; see Figure 1.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-altair-acusolve.pdf; see Figure 3.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-ansys-fluent.pdf; see Figure 3.
Please see https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-ansys-cfx.pdf; see Figure 3.
Altair AcuSolve: ~14.9 (with AMD 3D V-Cache technology) and ~10.8x (standard) running the impinging nozzle benchmark. See https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-altair-acusolve.pdf; see Figure 4.
Ansys Fluent: Average of ~10.9 (with AMD 3D V-Cache technology) and ~8.9x (standard) running the exhaust_system_33m, combustor_71m, landing_gear_15m, and aircraft_14m benchmarks. See https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-ansys-fluent.pdf; see Figures 4 and 5.
OpenFOAM: ~20.0 (with AMD 3D V-Cache technology) and ~11.7 (standard) running the ofoam-1084646 benchmark. See https://www.amd.com/system/files/documents/amd-epyc-7003-3d-vcache-pb-openfoam.pdf; see Figure 5.