cancel
Showing results for 
Search instead for 
Did you mean: 

4th Generation AMD EPYC™ Processors Excel at Electronic Design Automation (EDA) Workloads

raghu_nambiar
2 0 6,928

Electronic Design Automation (EDA) integrates hardware, software, and services to facilitate the process of defining, planning, designing, implementing, testing, and manufacturing semiconductors. Early chip designs were manually designed and occasionally drawn by hand. The rapid increase in transistor density from a few thousand to tens of billions necessitated developing tools to assist and automate this process. Modern semiconductor designs typically require many months and millions of dollars to complete. Physical samples are then created for testing, followed by additional months and millions of dollars’ worth of rigorous testing to identify and rectify any defects. This iterative cycle represents significant investments of both time and money. Despite these challenges, design teams face immense pressure to achieve functional and high-performing designs as quickly as possible. Combining this pressure with cost and time-to-market considerations emphasizes the importance of efficient design, accurate simulation, and thorough validation.

 

EDA encompasses the entire workflow from start to finish, with a focus on performance and efficiency. It enables designers to invent and prototype designs in a simulated environment, which is often faster and more cost-effective than manufacturing physical prototypes. The continuous growth in design complexity plus design enhancements offered by artificial intelligence further underscores the need for increasingly powerful computing resources to drive future semiconductor design innovations. 4th Gen AMD EPYC™ processors are exceptionally well-suited for Electronic Design Automation (EDA) workloads, and this assertion isn't just a statement—it's a critical part of AMD's strategy. We rely on current AMD EPYC processors to design future AMD EPYC processors, thereby underscoring the importance of maximizing AMD EPYC performance for EDA workloads to ensure our ongoing success as a company.

 

This blog summarizes the results obtained from evaluating 3rd and 4th Gen AMD EPYC processors across a diverse range of EDA workloads. It also includes links to published performance briefs that offer additional details about the performance results and comparisons between the tested AMD EPYC processors. Let's begin by previewing some of the performance enhancements discussed in this blog. The EDA workflow is multifaceted, with varying compute demands at each stage. Certain workloads benefit most from AMD EPYC processors equipped with AMD 3D V-Cache™ technology (9004X processors), while others see optimal performance with high-frequency AMD EPYC processors (9004F processors). This blog also highlight the top-of-stack processors from each generation, particularly in cases where the workload benefits from high core-count CPUs. In such scenarios, throughput increases as the core count rises, potentially resulting in a throughput boost of up to ~60% per server. This in turn may allow you to either increase performance within the same data center footprint or reduce your data center footprint. Alternatively, it may enable the maintenance of job throughput while reducing the data center footprint, thanks to the higher maximum core count offered by 4th Gen AMD EPYC processors compared to their 3rd Gen counterparts. Let's delve deeper into these numbers by examining our test methodology first, followed discussing each individual workload.

 

Testing and Results

AMD performed all testing on single-socket servers powered by selected 16, 32, and top-of-stack 64- and 96-core 3rd and 4th Gen AMD EPYC processors. We compared the performance of systems powered by different 4th Gen AMD EPYC processors relative to their 3rd Gen AMD EPYC counterparts, where the 3rd Gen test results were set to a baseline of 1.00x.

 

Some of the workloads presented below are cache bound and benefit from 4th Gen AMD EPYC processors with AMD 3D V-Cache technology, which triples the shared L3 cache from 32 MB to 96 MB per CCD, with 8-12 CCDs per socket versus general purpose EPYC 9004 processors. This translates to a total L3 cache size of up to 1,152 MB for 4th Gen AMD EPYC processors with AMD 3D V-Cache technology versus up to 384 MB for general-purpose and high frequency 4th Gen AMD EPYC processors. High-frequency 4th Gen AMD EPYC processors are ideal for workloads that are compute bound. EDA tools are generally sensitive to core performance. CPUs generally experience a tradeoff between the number of cores and per-core performance mostly due to frequency and contention on shared resources. Customers seek a balance between overall compute cost and workload productivity. This is why we investigated the tradeoff between per-core performance and the total number of cores by testing AMD EPYC processors with varying core counts. These tests show the compelling value proposition of 4th Gen AMD EPYC processors for EDA workloads based on individual customer requirements.

We utilized the test results to calculate the following metrics:

  • Runtime: This metric captures the elapsed runtime for each instance in seconds. These runtimes were aggregated and then divided by the number of concurrent instances of the workload (e.g., 2 on a 16-core system) to derive the average runtime for that benchmark. We ran three iterations of each benchmark on each server. Finally, the mean runtimes of each workload instance running on the system were averaged to determine the application performance on a fully loaded system.
  • Throughput: This metric measures the number of jobs completed per hour and is calculated as (1 / average runtime) multiplied by the number of concurrent jobs. For instance, if a 32-core system is executing 4 concurrent jobs with an average runtime of 2.5 hours, then the throughput equals (1 / 2.5) * 4 = 1.6 jobs per hour.
  • Performance-per-Watt: This metric represents the throughput divided by the average socket power in watts, as measured by turbostat v21.05.04 (PkgWatt metric) at 5-second intervals throughout each test duration.

This blog provides the composite average uplifts for the processors tested across all three metrics: Runtime, Throughput, and Performance-per-Watt. For instance, if a processor exhibits average uplifts of ~1.28x for Runtime, ~1.33x for Throughput, and ~1.26x for Performance-per-Watt, then the reported uplift for that processor will be ~1.29x, which represents the composite average of all three metrics.

Please see the published performance briefs for each workload for detailed information about per-metric uplifts, system configurations, and more. The test results are summarized in tables 1 and 2 below.

 

Workload

9184X
vs.
7373X
(16 cores)

9384X
vs.
7573X
(32 cores)

9684X (96c)
vs.
7773X (64c)

 

Synopsys VCS®

~1.19x

~1.28x

~1.55x

Synopsys PrimeSim™ SPICE

~1.27x

~1.43x

~1.67x

Siemens® Tessent™

~1.25x

~1.29x

~1.60x

Synopsys Formality® Equivalence

~1.02x

~1.13x

~1.37x

Cadence® Spectre® X

~1.13x

~1.26x

~1.55x

Table 1: Generational AMD 3D V-Cache uplifts on selected EDA workloads

 

Workload

9174F
vs.
73F3
(16 cores)

9374F
vs.
75F3
(32 cores)

9654 (96c)
vs.
7763 (64c)

 

Synopsys Fusion Compiler™ (Synthesis)

~1.19x

~1.23x

~1.51x

Synopsys Fusion Compiler™ (Placement)

~1.19x

~1.23x

~1.49x

Synopsys Fusion Compiler™ (Routing)

~1.22x

~1.25x

~1.50x

Siemens® Calibre® nmDRC

~1.17x

~1.27x

~1.60x

Ansys® RedHawk-SC™

~1.20x

~1.28x

Not tested

Synopsys PrimeTime® Suite

~1.25x

~1.32x

Not tested

Table 2: Generational high frequency & general-purpose uplifts on selected EDA workloads

 

Synopsys VCS®

The Synopsys VCS® functional verification solution provides innovative features to achieve high performance and enable shift-left verification flows early in the design cycle. Design Intent Verification (DIV) and Dynamic Test Loading (DTL) are the latest features included in the product. All testing ran Synopsys VCS to verify graphics core raytracing performance based on OpenCL™. We ran 2 series of tests for this workload (Tests A and B).

For Test A, AMD utilized a single processor core to execute one full copy of the Synopsys VCS application. Consequently, all processors were tested with a number of simultaneous jobs equal to the total processor cores available. For example, a system equipped with a 32-core AMD EPYC 7573X processor ran 32 simultaneous copies of the full application, as determined by the equation 32 CPU cores / 1 core per copy = 32 copies. This workload benefits most from the enhanced cache capacity available in 4th Gen AMD EPYC processors featuring AMD 3D V-Cache technology.

raghu_nambiar_0-1713368154325.jpeg

Figure 1: Composite average Synopsys VCS uplifts (Test A) - click for larger view

 

For Test B, AMD conducted experiments using varying numbers of concurrent full copies of the Synopsys VCS application to assess processor performance under diverse load conditions. The number of concurrent jobs ranged from 1 job to a quantity equivalent to the number of processor cores available. For example, each 16-core processor was subjected to 1, 2, 3, 4, 5, 6, 7, 8, 12, and 16 concurrent jobs.

These tests aimed to gauge processor performance under escalating workload scenarios, which exerted increasing stress on both the compute cores and available L3 cache. Generally, augmenting the number of concurrent jobs led to reduced runtime while simultaneously elevating both throughput and energy consumption.

Please see Synopsys VCS® Performance Uplifts for complete details, including both Test B results and uplifts obtained from general purpose and high frequency 4th Gen AMD EPYC processors across both tests.

 

Synopsys Fusion Compiler™

Synopsys Fusion Compiler™ serves as a unified tool designed for Register Transfer Level to Graphic Data System version II (RTL-to-GDSII) implementation. This comprehensive and integrated design and implementation environment combines advanced optimization algorithms with low-power design techniques and support for advanced process nodes. Its primary objective is to facilitate efficient and highly performant integrated circuit designs.

Our testing of Synopsys Fusion Compiler encompassed synthesis, placement, and routing functionalities to evaluating AMD EPYC performance across these key areas of the application.

Synthesis

The synthesis stage automatically transforms a Register-Transfer Level (RTL) model of a design into a gate-level netlist. AMD utilized 8 processor cores to execute each instance of the full Synopsys Fusion Compiler application for synthesis. This setup ensured that all processors were subjected to a number of simultaneous jobs equivalent to the total processor cores divided by 8, resulting in 8 cores per copy of the full application. For example, a system featuring a 32-core AMD EPYC 75F3 processor ran 4 simultaneous copies of the full application, calculated as 32 CPU cores / 8 cores per copy = 4 copies.

Our testing employed Synopsys Fusion Compiler for synthesizing a 5nm GFX SOC with 664k instances. The general-purpose 4th Gen AMD EPYC 9654 processor exhibited the highest generational uplift for this workload. However, at lower core counts, high-frequency 4th Gen AMD EPYC processors emerged as the optimal choice for this particular workload.

raghu_nambiar_1-1713368154331.jpeg

Figure 2: Composite average Synopsys Fusion Compiler uplifts (Synthesis) - click for larger view

 

Please see Synopsys Fusion Compiler™ Performance Uplifts – Synthesis for complete details, including uplifts obtained using 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

Placement

Placement involves determining the precise locations of electronic components, circuits, and logic elements within the physical space available on the wafer. AMD used 8 processor cores to execute each instance of the full Synopsys Fusion Compiler application for placement. All systems underwent testing with a number of simultaneous jobs equal to the total processor cores divided by 8. For example, a system equipped with a 32-core AMD EPYC 75F3 processor ran 4 simultaneous copies of the full application.

The testing focused on Synopsys Fusion Compiler for 5nm tile placement and ideal clock optimization for a GFX SOC with 664k instances. Notably, the general-purpose 4th Gen AMD EPYC 9654 processor demonstrated the highest generational uplift for this workload. However, at lower core counts, high-frequency 4th Gen AMD EPYC processors proved to be the optimal choice for this workload.

raghu_nambiar_2-1713368154336.jpeg

Figure 3: Composite average Synopsys Fusion Compiler uplifts (Placement) - click for larger view

 

Please see Synopsys Fusion Compiler™ Performance Uplifts – Placement for complete details, including uplifts obtained using 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

Routing

Routing is the process that determines the precise configuration of wires necessary to connect all of the placed components within a circuit design. AMD utilized 8 processor cores to execute each instance of the full Synopsys Fusion Compiler application for routing. All processors underwent testing with a number of simultaneous jobs equal to the total processor cores divided by 8. For instance, a system equipped with a 32-core AMD EPYC 75F3 processor ran 4 simultaneous copies of the full application.

The testing revolved around Synopsys Fusion Compiler for routing a 5nm GFX SOC with 664k instances. The general-purpose 4th Gen AMD EPYC 9654 processor demonstrated the most significant generational improvement for this workload. Lower core counts favored high-frequency 4th Gen AMD EPYC processors.

raghu_nambiar_3-1713368154339.jpeg

Figure 4: Composite average Synopsys Fusion Compiler uplifts (Routing) - click for larger view

 

Please see Synopsys Fusion Compiler™ Performance Uplifts – Routing for complete details, including uplifts obtained using 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

 

Siemens® Calibre® nmDRC

Siemens® Calibre® nmDRC is a crucial internal sign-off DRC solution utilized by major foundries due to its continuous functionality innovation and leading performance and capacity. AMD employed 8 processor cores to execute each instance of the Siemens Calibre nmDRC application, resulting in simultaneous testing of processor cores / 8 copies of the application. For instance, a system with a 32-core AMD EPYC 75F3 ran 4 simultaneous copies of the application.

The testing focused on Siemens Calibre nmDRC GFX SoC Tile design, conducting DRC checks on 165M geometries in a 5nm tile. The general-purpose 4th Gen AMD EPYC 9654 processor demonstrated the most significant generational improvement for this workload. Lower core counts favored high-frequency 4th Gen AMD EPYC processors.

raghu_nambiar_4-1713368154343.jpeg

Figure 5: Composite average Siemens Calibre nmDRC uplifts - click for larger view

 

Please see Siemens® Calibre® nmDRC Performance Uplifts for complete details, including uplifts obtained using 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

 

Ansys® RedHawk-SC™

Ansys® RedHawk-SC™ is a reliable multiphysics signoff solution for digital designs, offering robust analytics for identifying weaknesses and optimizing power and performance through "what-if" explorations. AMD utilized 8 processor cores to execute each instance of the complete Ansys RedHawk-SC application. Consequently, all processors underwent testing with a workload equivalent to the total processor cores divided by 8. For instance, a system featuring a 32-core AMD EPYC 75F3 concurrently ran 4 copies of the full application.

The dynamic IR analysis of a 5nm GFX SoC design encompassing 541K components was conducted using the Ansys RedHawk-SC L2 SRAM Array design. The high-frequency 32-core 4th Gen AMD EPYC 9374F processor exhibited the highest generational uplift for this workload.

raghu_nambiar_5-1713368154346.jpeg

Figure 6: Composite average Ansys RedHawk-SC uplifts - click for larger view

 

Please see Ansys® Redhawk-SC™ Performance Uplifts for complete details, including uplifts obtained using 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

 

Synopsys PrimeSim™ SPICE

Synopsys PrimeSim™ SPICE is an advanced Simulation Program with Integrated Circuit Emphasis (SPICE) engine, offering breakthrough performance for large post-layout analog, RF, SerDes, DRAM, and Flash designs. AMD utilized 8 processor cores to execute each instance of the complete Synopsys PrimeSim SPICE application. This resulted in all processors being evaluated with a number of concurrent tasks equivalent to the total processor cores divided by 8. For instance, a system with a 32-core AMD EPYC 7573X ran 4 simultaneous instances of the full application.

The testing focused on the Synopsys PrimeSim SPICE L2 SRAM Array design, conducting a transient analysis of a 5nm SRAM array containing 1.1 million transistors. This workload particularly benefits from the expanded cache capacity offered by 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

raghu_nambiar_6-1713368154351.jpeg

Figure 7: Composite average Synopsys PrimeSim SPICE uplifts - click for larger view

 

Please see Synopsys PrimeSim™ SPICE Performance Uplifts for complete details, including uplifts obtained from general purpose and high frequency 4th Gen AMD EPYC processors.

 

Siemens® Tessent™

The Siemens® Tessent™ Test Solutions product suite offers a range of silicon test and operations applications and IP designed to tackle the manufacturing test, debug, and yield ramp challenges associated with today's most intricate SoCs (System-on-Chips). AMD utilized a single processor core to execute a complete instance of the Siemens Tessent application. All processors underwent testing with a workload matching the total number of processor cores available. For instance, a system equipped with a 32-core AMD EPYC 7573X concurrently handled 32 copies of the application.

Testing involved generating test patterns for a 5nm GFX SOC tile comprising 664K instances. This workload leverages the expanded cache capacity offered by 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

raghu_nambiar_7-1713368154355.jpeg

Figure 8: Composite average Siemens Tessent uplifts - click for larger view

 

Please see Siemens® Tessent™ Performance Uplifts for complete details , including uplifts obtained from general purpose and high frequency 4th Gen AMD EPYC processors.

 

Synopsys Formality® Equivalence

Synopsys Formality® is an equivalence-checking (EC) solution that employs formal, static methods to ascertain whether two versions of a design are functionally identical. It offers features for Engineering Change Order (ECO) support and advanced debugging to aid users in implementing and verifying ECOs. Formality is compatible with all standard Design Compiler® and Fusion Compiler™ optimizations, thus ensuring high-quality, fully verifiable outcomes.

AMD utilized 2 processor cores to execute each instance of the full Synopsys Formality Equivalence application. All processors underwent testing with a number of simultaneous jobs equivalent to the total processor cores divided by 2. For instance, a system featuring a 32-core AMD EPYC 7573X simultaneously ran 16 copies of the full application due to the division of 32 CPU cores by 2 cores per copy, resulting in 16 copies.

Testing involved verifying equivalence between RTL and gate implementation of a 5nm GFX SOC tile with 664K instances. This workload benefited from the expanded cache available in 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

raghu_nambiar_8-1713368154360.jpeg

Figure 9: Composite average Synopsys Formality uplifts - click for larger view

 

Please see Synopsys Formality® Equivalence Performance Uplifts for complete details , including uplifts obtained from general purpose and high frequency 4th Gen AMD EPYC processors.

 

Cadence® Spectre® X

Cadence® Spectre® X Simulator facilitates high-speed and high-capacity simulation tasks through its massively distributed workload capability. It excels in SPICE-accurate simulation with rapid convergence, scalable performance, and a specialized mode for managing extremely large system simulations.

AMD employed 8 processor cores for each instance of the complete Cadence Spectre X application. All processors underwent testing with a workload consisting of the total processor cores divided by 8, resulting in the number of simultaneous instances of the complete application. For instance, a system featuring a 32-core AMD EPYC 7573X accommodated 4 concurrent instances of the complete application. Affinity (CPU pinning) was utilized during Cadence Spectre X testing, aligning with Cadence's recommendation to pin CPUs on the same NUMA node when feasible.

Testing specifically focused on the Cadence Spectre X SPICE DDR PLL transient analysis of a 5nm DDR PLL containing 1.8 million nodes. This workload benefited from the enlarged cache offered by 4th Gen AMD EPYC processors equipped with AMD 3D V-Cache technology.

raghu_nambiar_9-1713368154365.jpeg

Figure 10: Composite average Cadence Spectre X uplifts - click for larger view

 

Please see Cadence® Spectre® X Performance Uplifts for complete details , including uplifts obtained from general purpose and high frequency 4th Gen AMD EPYC processors.

 

Synopsys PrimeTime® Suite

The Synopsys PrimeTime® signoff solution suite comprises PrimeTime, PrimeTime SI, PrimeTime ADV, and PrimeTime PX. PrimeTime offers a unified signoff solution, incorporating intelligent methodologies for timing, signal integrity, power, timing constraint, and variation-aware analysis. AMD utilized all processor cores to run a single instance of the complete Synopsys PrimeTime Suite application. Each processor executed a single job, utilizing all available cores. For instance, a system equipped with a 32-core AMD EPYC 75F3 ran a single instance of the application utilizing all 32 CPU cores.

During testing, the Synopsys PrimeTime Suite was employed for Large GFX Top-Level design Flat SI setup timing analysis, focusing on one corner of the design. The design consisted of 181 million leaf cells, ports, hierarchies, and 332 million nets in the 5nm design. Notably, the high-frequency 32-core 4th Gen AMD EPYC 9374F processor exhibited the highest generational uplift for this workload.

raghu_nambiar_10-1713368154368.jpeg

Figure 11: Composite average Synopsys PrimeTime Suite uplifts - click for larger view

 

Please see Synopsys PrimeTime® Suite Performance Uplifts for complete details, including uplifts obtained using 4th Gen AMD EPYC processors with AMD 3D V-Cache technology.

 

Conclusion

The Electronic Design Automation (EDA) process is a multifaceted workflow crucial for efficiently and promptly designing modern semiconductors. Each stage of this workflow imposes distinct computational demands to ensure efficacy and performance. Given that EDA plays a critical role in semiconductor production, performance is of utmost importance to manage time-to-market effectively. 4th Gen AMD EPYC processors remain at the forefront, providing the high performance and efficiency demanded by modern high-performance computing workloads, including EDA tasks. The remarkable performance of AMD EPYC processors is evidenced by their accumulation of over 300 world records across all current generations. EDA powered by AMD EPYC processors stands central to our ongoing success, with AMD utilizing these processors to design future products, underscoring our commitment to excellence.

For Further Reading

About the Author
Raghu Nambiar currently holds the position of Corporate Vice President at AMD, where he leads a global engineering team dedicated to shaping the software and solutions strategy for the company's datacenter business. Before joining AMD, Raghu served as the Chief Technology Officer at Cisco UCS, instrumental in driving its transformation into a leading datacenter compute platform. During his tenure at Hewlett Packard, Raghu made significant contributions as an architect, pioneering several groundbreaking solutions. He is the holder of ten patents, with several more pending approval, and has made extensive academic contributions, including publishing over 75 peer-reviewed papers and 20 books in the LNCS series. Additionally, Raghu has taken on leadership roles in various industry standards committees. Raghu holds dual Master's degrees from the University of Massachusetts and Goa University, complemented by completing an advanced management program at Stanford University.