“Integration” is the great quest in the PC industry, and it is the idea that we can simultaneously make devices smaller, faster and more functional over time. The steady and relentless march of integration has fueled incredible advancements in mobility, discourse, politics, society, education and so much more. Integration must and will continue: the appetite for smaller, faster, better devices shows no sign of slowing. But there are some challenges on the horizon, and it is now clearer than ever that radical new technologies like High-Bandwidth Memory (HBM) can be the answer.

 

However, in order to truly understand the need for HBM, we must first spend just a few minutes exploring the history of integration itself.

 

A FEW WORDS ON TRANSISTORS

All integrated circuits, like a processor or graphics chip, are built from basic building blocks called the “transistor.” These transistors electrically switch between “on” and “off” to represent binary 1s and 0s.  Together a legion of these transistors organized in specific ways can perform the math necessary to make your device do whatever you're asking it to do.

 

Across the decades, engineering advancements have allowed us to fit thousands, then millions, then billions of these transistors into the small space of a processor or graphics chip. That increasing transistor density has afforded more performance, along with opportunity for chipmakers like AMD to do away with other bulky devices by integrating them into the chip too.

 

integration.png
IN 2011: integration through increasing transistor density allowed AMD to combine a northbridge, quad core CPU and a graphics card into just one chip: the AMD A-Series APU. This one chip used up to 39% less power and 39% less space than the standalone pieces it replaced.

 

Over the decades, these engineering advancements have also been pretty darn predictable. So predictable, in fact, that we have a name for it: Moore’s Law. During his time at Fairchild Semiconductor, Gordon E. Moore famously predicted in 1965 that integrated circuits (like CPUs and GPUs) would double in density every 12 months. Moore later revised that to every 24 months in 1975, and his revised observation has generally rung true for the past 40 years as innovators like AMD find new and exciting ways to pack more transistors in ever-smaller spaces.

 

CHALLENGES TO INTEGRATION

This year marks the 50th anniversary of Moore’s Law, and integration is facing at least one challenge: off-chip technologies ripe for integration, like DRAM, are not size or cost-effective. However, there are significant performance, power and form factor benefits to integration, so another method of achieving that integration must be explored.

 

integration2.png

Moore conceptually envisioned a possible solution, and offered the following insight in the same 1965 paper that established Moore’s Law.

 

“It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected," he wrote. "The availability of large functions, combined with functional design and construction, should allow the manufacturer of large systems to design and construct a considerable variety of equipment both rapidly and economically.”

 

A LARGE SYSTEM OF SMALLER FUNCTIONS

HBM is a new type of CPU/GPU memory (“RAM”) that vertically stacks memory chips, like floors in a skyscraper. Those towers connect to the CPU or GPU through an ultra-fast interconnect called the “interposer.” Much like sticking those famously colorful building blocks into that famous green base, several stacks of HBM are plugged into the interposer alongside a CPU or GPU, and that assembled module connects to a circuit board.

interposer.png

Though these HBM stacks are not physically integrated with the CPU or GPU, they are so closely and quickly connected via the interposer that HBM’s characteristics are nearly indistinguishable from on-die integrated RAM.

 

JUST THINK SMALL

GDDR5 has served the industry well these past seven years, and many gigabytes of this memory technology are used on virtually every high-performance graphics card to date.

 

But as graphics chips grow faster, their appetite for fast delivery of information (“bandwidth”) continues to increase. GDDR5’s ability to satisfy those bandwidth demands is beginning to wane as the technology reaches the limits of its specification. Each additional gigabyte per second of bandwidth is beginning to consume too much power to be a wise, efficient, or cost-effective decision for designers or consumers.

curve.png

Taken to its logical conclusion, GDDR5 could easily begin to stall the continued performance growth of graphics chips. HBM resets the clock on memory power efficiency, offering >3.5X the bandwidth per watt of GDDR5 with both superior bandwidth and lower power consumption.1

efficiency.PNG

Consider also the sheer area taken up by GDDR5. Whereas 1GB of GDDR5 on a graphics card might require 672 square millimeters of room on a circuit board, the same quantity of HBM requires just 35 square millimeters of space—a 94% space savings.3 For now we ask that you imagine the possibilities of a product no longer governed by the size and quantity of its memory chips, or all the power circuitry required to get them up to speed.

size.PNG

 

FOR THE GOOD OF ALL OF US

High-Bandwidth Memory and the high-volume manufacturable interposer are technologies invented and proposed by AMD over seven years ago. We have spent the ensuing years gaining expert allies in the interconnect and memory technology industries to help us perfect, manufacture, and standardize the technology for use across the PC industry.

 

SK hynix is one of those allies, and their memory manufacturing techniques have helped miniaturize and package key aspects of the memory to make it suitable for cost-effective mass production. With HBM, SK Hynix has once again proven a leader in the manufacture of cutting-edge memory technology.

 

We also owe gratitude to ASE, Amkor and UMC, who were instrumental in the realization of our initial interposer design. Though interposers are not a new technology, an interposer suitable for interconnecting HBM and high-performance ASICs is new, and years of work were required to reach mass production.

 

The JEDEC Solid State Technology Association is another key ally. This consortium of 300+ engineering and silicon design firms (like AMD) helps standardize specifications, implementation and testing of new memory technologies like HBM. JEDEC’s specialty is in open standards, which permit any and all companies to freely manufacture a technology if they follow the letter and the spirit of the standard.

 

AMD strongly believes in contributing revolutionary new technologies to the world as open industry standards. The proliferation of AMD proposals like GDDR5, DisplayPort™ Adaptive-Sync, HSA, low-overhead graphics APIs and Wake-on-LAN (to name a few) are evidence of this. HBM is the latest entry to this rich history of open innovation and, with JEDEC specification JESD235, High-Bandwidth Memory is now freely available to all JEDEC members.

 

LOOKING TO THE FUTURE

Thanks to nearly a decade of engineering work by AMD and its technology partners, HBM and the interposer smash through the power, performance and form factor boundaries erected by GDDR5. The way is paved for more compact high-performance devices for years to come! We’re thrilled to announce that you won’t have to wait very long to bring  into your home, either. HBM and the future of chip design will be available in an AMD product as soon as this summer.

 

Robert Hallock is the Head of Global Technical Marketing at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites and no endorsement is implied.

 


FOOTNOTES:

1. Testing conducted by AMD engineering on the AMD Radeon™ R9 290X GPU vs. an HBM-based device. Data obtained through isolated direct measurement of GDDR5 and HBM power delivery rails at full memory utilization. Power efficiency calculated as GB/s of bandwidth delivered per watt of power consumed. AMD Radeon™ R9 290X (10.66 GB/s bandwidth per watt) and HBM-based device (35+ GB/s bandwidth per watt), AMD FX-8350, Gigabyte GA-990FX-UD5, 8GB DDR3-1866, Windows 8.1 x64 Professional, AMD Catalyst™ 15.20 Beta. HBM-1

2. Testing conducted by AMD engineering on the AMD Radeon™ R9 290X GPU vs. an HBM-based device. Data obtained through isolated direct measurement of GDDR5 and HBM power delivery rails at full memory utilization.  AMD Radeon™ R9 290X and HBM-based device, AMD FX-8350, Gigabyte GA-990FX-UD5, 8GB DDR3-1866, Windows 8.1 x64 Professional, AMD Catalyst™ 15.20 Beta. HBM-3

3. Measurements conducted by AMD Engineering on 1GB GDDR5 (4x256MB ICs) @ 672mm2 vs. 1GB HBM (1x4-Hi) @ 35mm2. HBM-2