cancel
Showing results for 
Search instead for 
Did you mean: 

"Bergamo" 4th Gen AMD EPYC™ 97x4 Processors: Built for Cloud Native Workloads

raghu_nambiar
2 0 9,041

This week, we introduced two additional processor models to the 4th Gen AMD EPYC™ processor family at the Datacenter and AI Technology Premier. I am excited to take this opportunity to provide an update on the software ecosystem readiness and showcase some of the performance proof points for the AMD EPYC 97x4 processors.

AMD continues innovating and creating products that are optimal for specific segments of today’s modern datacenter. AMD EPYC 97x4 processors are the industry’s first x86 processors purpose-built for cloud-native computing. These processors feature up to 128 “Zen 4” cores built on 5nm process technology that deliver industry leading performance, density, and energy efficiency for cloud native workloads. They also include the same cutting-edge technologies found in all 4th Gen AMD EPYC processors, including 12 channels of DDR5 with supported memory speeds up to 4800 GHz, up to 128 (1P) or 160 (2P) lanes of PCIe® Gen5 delivering 2x the transfer rate of PCIe Gen4, 3rd Gen Infinity Fabric™ delivering 2x the data transfer rate of 2nd Gen Infinity Fabric, AMD Infinity Guard technology to defend your data while in use, and socket compatibility with current 4th Gen AMD EPYC platforms.

AMD EPYC 97x4 processors continue to extend the leadership performance established by standard AMD EPYC™ 9004 Series Processors. The 300+ world records earned by AMD EPYC 9004 Series processors speak to AMD’s relentless pursuit of performance leadership with industry leading energy efficiency and optimal Total Cost of Ownership (TCO). The industry has responded to these efforts: A rich and growing ecosystem of full-stack solutions and partnerships leverages the innovative features and technologies found in AMD EPYC processors to enable faster time to value for customers’ current and future workload needs.

We are grateful for our broad ecosystem of partners who continue to collaborate with our engineers to deliver a wide range of datacenter solutions, including:

Alibaba Cloud, Altair, AlmaLinux, Amazon Web Services, Anjuna, Ansys, ASRock, Asus, Atos, BEAMR, Broadcom, Cadence, Canonical, Casa Systems, Cisco, Citrix, Cloudera, Couchbase, Dassault Systèmes, Datastax, Dell, Elastic, Equinix, ESI, Excelero, Foxconn, FreeBSD, Gigabyte, Google Cloud, HBC, HPE, IBM Cloud, Inventec, JMA, Juniper, Kioxia, Lenovo, MariaDB, Mavenir, SingleStore, Micron, Microsoft, Mitac, Neural Magic, MongoDB, MSI, MySQL, NetScout, Nokia, Nutanix, Oracle, PGS Software, QCT, Quobyte, Radisys, Red Hat, RedisLabs, Robin, Rocky Linux, Samsung, Shearwater, Siemens Digital Industries Software, SK Hynix, SLB, Splunk, StorMagic, Supermicro, SUSE, Synopsis, Tencent Cloud, TigerGraph, Transwarp, Tyan, Velocix, Vertica, WEKA, VMware, Western Digital, Wiwynn, Wistron and others.

Let’s look under the hood of AMD EPYC 97x4 processors, the markets it serves, and some key performance highlights.

Uncompromised Cloud Native Computing

The new 4th Gen AMD EPYC™ 97x4 processors offer no-compromise computing performance, density, and energy efficiency that meet the needs of growing cloud native environments. Cloud native development practices are emerging as an optimized approach for developers to rapidly deliver efficient and scalable services for a new generation of workloads across a wide variety of industries and verticals. The need for frequent and fast modifications to software stacks in the cloud environment drives an ongoing demand for efficient and scalable architectures. Systems featuring AMD EPYC 97x4 processors address this need by providing a robust, scalable, energy and space-efficient environment to run cloud native services today and in the future at optimal TCO. AMD EPYC 97x4 processors include up to 128 SMT-capable (Simultaneous Multithreading), processor cores, unlocking new levels of performance, energy efficiency, and compatibility.

I am excited to share that AMD EPYC 97x4 processors outperform both Ampere® Altra ® Max M128-30 and Intel ® Xeon ® Platinum 8490H processors for cloud-native application. Here are the results of performance tests that we ran across multiple cloud native workloads that showcase the superior performance of AMD EPYC 97x4 processors.

  • Relational Database Management Systems (RDBMS): MySQL™ is one the world’s most popular open-source database management systems. It is regarded as a high-performance relational database management system that is reliable and easy to use. The many frameworks and tools that help easily deploy and monitor MySQL helped it emerge as a robust and secure database that is well suited for cloud deployments.

    AMD engineers ran a workload derived from the popular TPC Benchmark™ C on the AMD EPYC 9754 processor and competitive systems. The test methodology included systems under test configured with Ubuntu® v22.04 and MySQL™ v8.0.33. HammerDB v4.4 was run on a different client system to build and generate the workload. Multiple VMs were run on the system under test with 32 vCPUs and 128 GiB memory per VM. Each VM ran one MySQL database with the schema created for 1000 warehouses. All VMs were simultaneously loaded by an individual HammerDB client instance per VM. The workload was run for 10 minutes each, and the aggregate of median New Orders Per Minute (NOPM) values were recorded across 5 runs per platform to compare relative performance.

    A two-processor AMD EPYC 9754 system outperforms both a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 1. [1]

raghu_nambiar_0-1686931522582.png

Figure 1: MySQL transaction processing performance

  • Enterprise Java ®: Java ® has become a universal language across enterprises and clouds. Running Java in cloud environments simplifies development and deployment compared to traditional languages, such as C or C++. The SPECjbb® 2015 benchmark is a popular yardstick that enables fair performance measurements of server-side Java based applications. SPECjbb® 2015 simulates a company with an IT infrastructure that handles a mix of point-of-sale requests, online purchases, and data-mining operations. The rapid adoption of Java across the industry in the last two decades makes this benchmark relevant to all audiences, including Java Virtual Machine (JVM) vendors, hardware developers, Java application developers, researchers, and members of the academic community.

    A two-processor AMD EPYC 9754 system outperforms a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 2. [1]

raghu_nambiar_1-1686931522583.png

Figure 2: Server-Side Java performance

  • NGNIX®: Today, it is common to create websites and deploy complete web-serving stacks in the cloud. NGNIX® is a popular webserver that can also be used as a reverse proxy, load balancer, mail proxy, and HTTP cache. AMD tested NGINX throughput in requests per second as a high-performance web server in conjunction with the WRK web (http) client.

    The test methodology included systems under test configured with Ubuntu v22.04, NGINX v1.18.0 as the server, and WRK v4.2.0 as the web client. Multiple server and client instances were run on the system under test with 16 vCPUs per instance shared between the web server and client. Both server and client were run on the same system to minimize network latency. Each WRK client created 650 connections fetching a small static binary file from its respective NGINX server. The workload test was run for 90 seconds each, and the aggregate of median requests per second (rps) values were recorded across 3 runs per platform to compare relative performance.

    A two-processor AMD EPYC 9754 system outperforms a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 3. [1]

raghu_nambiar_2-1686931522584.png

Figure 3: NGINX Webserver performance

  • Redis™: Redis™ is an in-memory data structure store used as a distributed, in-memory key–value database, cache, and message broker, with optimal durability. Redis is well suited for the cloud and allows for all its standard features including streaming, micro-services, and data analytics to enable hyper-productivity within a tenant. Redis supports various kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indices. Redis works with in-memory datasets to achieve top performance.

    The test methodology included systems under test configured with Ubuntu v22.04 OS, Redis v7.0.11 as the server, and redis-benchmark v7.0.11 as the client. Multiple server and client instances were run on the system under test with 8 vCPUs per instance shared between the server and client. Both server and client were run on the same system to minimize network latency. Each WRK client created 512 connections setting/getting a key size of 1000 bytes from its respective Redis server. The workload test was run for 10M requests each, and the aggregate of median requests per second (rps) values were recorded across 3 runs per platform to compare relative performance.

    A two-processor AMD EPYC 9754 system outperforms a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 4.1 and Figure 4.2. [1]

raghu_nambiar_3-1686931522584.png

Figure 4.1: Redis-bench GET performance

 

raghu_nambiar_4-1686931522585.png

Figure 4.2: Redis-bench SET performance

  • FFmpeg: FFmpeg is a free, open-source software project that consists of a suite of libraries, codecs, and programs that handle video, audio, and other multimedia files and streams. The core FFmpeg program is designed for command-line processing of video and audio files. It is widely used for encoding, transcoding, editing, video scaling, video post-production, and standards compliance. In this performance characterization, we transcoded a raw 4K resolution input file using the VP9 codec to an MKV output file.

    The test methodology included systems under test configured with Ubuntu v22.04 and FFmpeg v4.4.2. Multiple FFmpeg instances were run on the system under test with 4 vCPUs per FFmpeg instance. Each FFmpeg instance transcoded a single input file with 4K resolution in raw video format on an NVMe drive into an output file with the VP9 codec on a separate NVMe drive. Multiple FFmpeg jobs were run concurrently on each system, and aggregate performance on each system was compared using the median total frames processed per hour across 3 runs.

    A two-processor AMD EPYC 9754 system outperforms a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 5. [1]

raghu_nambiar_5-1686931522585.png

Figure 5: Video transcoding performance

  • Cassandra®: The Apache® Cassandra® database is the right choice when you require scalability and continuous availability without compromising performance. Cassandra also simplifies data distribution in multi-tenant environments and is particularly suited for the cloud. Predictable scalability and proven fault-tolerance in the cloud infrastructure make it an ideal platform for mission-critical data.

    The test methodology included systems under test configured with Ubuntu v22.04, OpenJDK v11, Apache Cassandra v4.1.2 as the server application, and Cassandra-test v4.1.2 as the client. Multiple server and client instances were run on the system under test with 32 vCPUs per instance. Each Cassandra database was started and 1M initial values were added to it. Each client-server instance pair was tested with a 25% write and 75% read mix. Multiple client-server pairs were run concurrently, and aggregate performance on each system was compared using the median total frames processed per hour across 5 runs.

    A two-processor AMD EPYC 9754 system outperforms a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 6. [1]

raghu_nambiar_6-1686931522586.png

Figure 6: Cassandra performance

  • Memcached™: Memcached™ is a high-performance, distributed in-memory caching system that stores key-values for small chunks of arbitrary data (strings, objects) from the results of calls (database or API) or rendering pages. Memcached is very popular in the cloud because it can serve cached items fast and allows easy scaling and cost effectiveness for higher loads. Memcached is heavily used for database query results caching, session caching, web page caching, API caching, and caching of objects such as images, files, and metadata. It is simple, powerful and designed for quick and easy development and deployment to solve problems faced by large data caches.

    The test methodology included systems under test configured with Ubuntu v22.04, Memcached v1.6.14 as the server, and memtier v1.4.0 as the client. Multiple server and client instances were run on the system under test with 8 vCPUs per instance shared between the server and client. Both server and client were run on the same system to minimize network latency. Each client created 10 connections, 8 pipelines, and a 1:10 set:get ratio with its respective Memcached server using the Memcached_text protocol. The aggregate median requests per second (rps) values were recorded across 3 runs per platform to compare relative performance.

    A two-processor AMD EPYC 9754 system outperforms a two-processor Ampere Altra Max M128-30 system and a two-processor Intel Xeon Platinum 8490H system. See Figure 7. [1]

raghu_nambiar_7-1686931522586.png

Figure 7: Memcached performance

  • VMmark® 3: VMmark® 3 is the industry leading enterprise virtualization consolidation benchmark that measures the performance and scalability of the VMware® vSphere ® hypervisor on a variety of hardware vendor platforms. Platforms with AMD EPYC processors currently hold performance leadership in the 3 most competitive categories: 2 Node 4 Total Socket SAN, 4 Node 8 Total Socket vSAN, and Overall Leadership. VMmark3 is popular with consumers of VMware vSphere both on-premises and public cloud because of its reliable and accurate body of publications hosted on the VMware VMmark3 website*.

    AMD is proud of the VM density enabled by our EPYC product family, and the introduction of the AMD EPYC 9754 is no exception. Compared with an Intel Xeon Platinum 8490H 2-Node vSphere cluster, the AMD EPYC 9754 commands an ~89% improvement in virtualized consolidation performance. See Figure 8.[2]

raghu_nambiar_8-1686931522587.png

Figure 8: VMmark3 performance (2 nodes, matched pair)

  • Power efficiency: Datacenter energy consumption is one of the top challenges faced by cloud service providers as energy cost has become one of the leading factors driving the cost for running datacenters. Platforms powered by AMD EPYC 9754 processors deliver both leading power savings and world record performance as measured by the trusted, and established SPECpower_ssj 2008 benchmark. Many power efficiency claims use partial data such as idle power or 100% load level, but SPECpower_ssj2008 evaluates power and performance characteristics by measuring both active idle power and graduated load levels that start at 10% and ramp up to 20%, 30%, and eventually reach 100% using a full spectrum of system utilization.

    A two-processor AMD EPYC 9754 system delivers power efficiency at idle,10%, 20%, and all the way through 100% load capacity and overall ~1.97x power efficiency of a two-processor Intel Xeon 8490H and ~2.73x power efficiency of a two-processor Ampere Altra Max 128-30 for SPECpower_ssj2008 based on results published at spec.org. See Figure 9. [3]

raghu_nambiar_9-1686931522589.pngFigure 9: SPECpower_ssj 2008 performance at various load levels

Conclusion

AMD is committed to our partners. We understand the need to address the evolution of the various market segments and verticals that our partners serve. We continue to innovate products that are specialized for targeted segments, and today’s introduction of AMD EPYC 97x4 processors is yet another testament to our ongoing commitment.

AMD offers guidance around the best CPU tuning practices to achieve optimal performance on these key workloads when deploying 4th Gen AMD EPYC processors for your environment. Please visit AMD EPYC™ Server Processors to learn more.

The launch of 4th Gen AMD EPYC processors in November of 2022 marked the debut of the world’s highest-performance server processor that delivers optimal TCO across workloads, industry leadership x86 energy efficiency [4] to help support sustainability goals, and Confidential Computing across a rich ecosystem of solutions. The advent of AMD EPYC 97x4 processors and AMD EPYC 9004 processors with AMD 3D V-Cache ™ technology expands the line of 4th Gen AMD EPYC processors with new processor models that are optimized for cloud infrastructure and memory-bound workloads, respectively.

Other key AMD technologies include:

  • AMD Instinct™ accelerators are designed to power discoveries at exascale to enable scientists to tackle our most pressing challenges.
  • AMD Pensando™ solutions deliver highly programmable software-defined cloud, compute, networking, storage and security features wherever data is located, helping to offer improvements in productivity, performance and scale compared to current architectures with no risk of lock-in.
  • https://www.amd.com/en/solutions/infrastructure-accelerationAMD FPGAs and Adaptive SoCs offer highly flexible and adaptive FPGAs, hardware adaptive SoCs, and the Adaptive Compute Acceleration Platform (ACAP) processing platforms that enable rapid innovation across a variety of technologies from the endpoint to the edge to the cloud.

Raghu Nambiar is a Corporate Vice President of Data Center Ecosystems and Solutions for AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

References

  1. Cloud-Native Workloads on AMD EPYC ™ 9754 Processors: https://www.amd.com/system/files/documents/amd-epyc-9754-pb-cloud-native-workloads.pdf
  2. Results as of 6/16/2023 are published at:
  3. Power Efficiency of AMD EPYC ™ 9754 Processors SPECpower_SSJ® 2008: https://www.amd.com/system/files/documents/amd-epyc-9754-pb-spec-power.pdf

  4. Results as of 6/16/2023 are published at: (i) 2P AMD EPYC 9754 (128-core, SMT on) 33,300 SPECpower_ssj2008 overall ssj_ops/watt https://www.spec.org/power_ssj2008/results/ res2023q2/power_ssj2008-20230523-01264.html, (ii) 2P Intel® Xeon® Platinum 8490H (60core) 16,902 SPECpower_ssj2008 overall ssj_ops/watthttps://www.spec.org/power_ssj2008/ results/res2023q2/power_ssj2008-20230507-01251.html, (iii) 2P Ampere Altra Max M128-30,12,195 SPECpower_ssj2008 overall ssj_ops/watt https://www.spec.org/power_ssj2008/results/ res2023q2/power_ssj2008-20230522-01258.html

     
About the Author
Raghu Nambiar currently holds the position of Corporate Vice President at AMD, where he leads a global engineering team dedicated to shaping the software and solutions strategy for the company's datacenter business. Before joining AMD, Raghu served as the Chief Technology Officer at Cisco UCS, instrumental in driving its transformation into a leading datacenter compute platform. During his tenure at Hewlett Packard, Raghu made significant contributions as an architect, pioneering several groundbreaking solutions. He is the holder of ten patents, with several more pending approval, and has made extensive academic contributions, including publishing over 75 peer-reviewed papers and 20 books in the LNCS series. Additionally, Raghu has taken on leadership roles in various industry standards committees. Raghu holds dual Master's degrees from the University of Massachusetts and Goa University, complemented by completing an advanced management program at Stanford University.