When it comes to HPC, Memory Matters

Blog Post created by forrest.norrod Employee on Jun 26, 2018

High Performance Computing (HPC) is one of the most important and fastest growing markets in the datacenter. It’s perhaps an overused term, but HPC as referring to applying massive computing resources to solve complex problems has become critical well beyond its start in scientific research. Multiple workloads from finance, retail, oil and gas, weather, engineering, and education leverage HPC today. Common to many of these applications is the importance of memory, and I/O bandwidth.


A large percentage of HPC workloads are dependent on memory bandwidth as the problems being addressed often don’t fit into caches like other applications can. Insufficient memory bandwidth or insufficient memory capacity can result in CPU compute engines waiting idle. You can have the most CPU cores in the world, but if they aren’t fed the right data in an efficient manner, they can’t do useful work. The situation is analogous to race cars - you can have the biggest engine ever made under the hood, but if you have a tiny fuel line that can’t provide enough fuel to the engine, the car won’t go very fast.


Beyond memory bandwidth, you also need enough Input/Output (I/O) bandwidth to ensure that data can get in and out of the CPU and memory. Critical I/O interfaces to storage and the network – be it Ethernet or Infiniband-  are usually connected via PCIe. Bandwidth and latency on those interfaces can quickly become the bottleneck in systems with overloaded PCIe links. When balanced optimally, jobs are loaded and run faster, you can do deeper analysis to get better results, and/or the number of systems to achieve this analysis is reduced.


In recent years the PCIe connections are also being increasingly used to extend the compute capability of the system by connecting to GPUs or FPGA-based accelerators. Many applications scale well with the vector math capabilities of GPUs or by dedicating logic in FPGAs to the inner loops of critical algorithms.  Perhaps the most important emerging applications in machine learning are where “heterogenous” systems with high-performance CPUs and accelerators are the right answer.


All of this thinking went into the design of the AMD EPYC™ processor, and it shows. EPYC is an architecture built for the workloads and applications of current and future datacenters.


  • AMD EPYC has up to 33% more memory bandwidth per core than the competition to keep data flowing to the processors1;
  • A 2P AMD EPYC 7601 processor offers up to 2.6x the memory capacity than a 2P Intel Xeon Platinum 81802;
  • All AMD EPYC processors have the ability to support up to 128 PCIe lanes so that I/O does not become a bottleneck3;
  • EPYC has outstanding floating point capabilities with world record performance in multiple floating-point benchmarks and real HPC applications4;
  • Single and dual-socket EPYC-based server solutions allows up to six GPUs or FPGAs to be attached to the CPU with enough lanes left over for high-speed storage devices and high-speed Ethernet or InfiniBand connections.


Many AMD EPYC platforms on the market today deliver outstanding performance on memory bound workloads. For virtualized and memory-centric solutions, both HPE and Dell offer 2U rack-based systems – the HPE ProLiant DL385 Gen10 and the Dell PowerEdge R7425. For ultra-dense compute solutions, Supermicro, Cray and Cisco have 4 nodes in a 2U (4N/2U) solutions. The Supermicro AS -2123BT-HNC0R, Cray CS500 and Cisco UCS C4200/C125.


AMD EPYC has been met with great excitement by the market, and its balanced architecture delivers world record performance. And looking ahead, we have a strong roadmap that is primed to deliver premium performance and innovation for years to come.





1 AMD EPYC™ 7601 processor supports up to 8 channels of DDR4-2667, versus the Xeon Platinum 8180 processor at 6 channels of DDR4-2667. NAP-42


2 A single AMD EPYC™ 7601 processor offers up to 2TB/processor (x 2 = 4TB), versus a single Xeon Platinum 8180 processor at 768Gb/processor (x 2 = 1.54TB). NAP-44


3AMD EPYC™ processor supports up to 128 PCIe® Gen 3 I/O lanes (in both 1 and 2-socket configuration), versus the Intel® Xeon® SP Series processor supporting a maximum of 48 lanes PCIe® Gen 3 per CPU, plus 20 lanes in the chipset (max of 68 lanes on 1 socket and 116 lanes on 2 socket). NAP-56





Cautionary Statement

This blog contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) including, but not limited to, the strength, expectations and benefits regarding AMD’s technology roadmap, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects" and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this blog are based on current beliefs, assumptions and expectations, speak only as of the date of this blog and involve risks and uncertainties that could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and statements. Investors are urged to review in detail the risks and uncertainties in AMD's Securities and Exchange Commission filings, including but not limited to AMD's Quarterly Report on Form 10-Q for the quarter ended March 31, 2018.