Author: Raja Swaminathan, Senior Fellow, Advanced Packaging Leader, AMD
AMD’s move to chiplet-based architectures drives its CPU/GPU roadmaps and relies heavily on next-generation die-to-die interconnect schema, enabled by new advanced packaging technologies. Many package architectures exist to enable die-to-die interconnects across various product segments. Examples include multi-chip modules (MCM), 2D-organic redistribution layer (RDL) based architectures (like INFO or FoCoS - used in mobile markets), 2D-Silicon based architectures (like AMD’s Si interposer and Intel’s EMIB - used in high-performance computing), as well as 3D active-on-active silicon stacking (primary examples include Intel’s Foveros and AMD’s 3D chiplet architectures). But chiplet package architecture choice is not a one size fits all approach, rather they are made based on specific Performance, Power, Area, and Cost requirements per product.
AMD investments and innovations in packaging have been a multi-year, multi-technology journey. AMD introduced High Bandwidth Memory (HBM) and the 2.5D silicon interposer technology to the GPU market in 2015, which led the industry for best memory bandwidth in a small form factor[i]. We then set a new performance trajectory for compute in the data center and PC markets in 2017, with the introduction of MCM packaging. In 2019, we introduced industry-first chiplet-based technology, using different process nodes for cores and the IO within the same package, thus enabling significantly higher performance and capabilities. At Computex 2021, President and CEO Dr. Lisa Su announced the next big step in AMD’s continued trajectory for pushing the limits of advanced packaging ─ 3D chiplets. In this collaboration with TSMC, this architecture combines AMD chiplet-packaging with die stacking to create a 3D chiplet architecture for future high-performance computing products.
AMD is using an industry-first hybrid bond plus through-silicon-vias (TSVs) approach that provides over 200x interconnect density of AMD 2D chiplets and more than 15x the density compared to existing Intel 3D stacking solutions[1]. This enables a much more efficient and denser integration of AMD IP blocks. The die-to-die interface uses a direct copper-to-copper bond with no solder bumps of any kind. This approach dramatically improves transistor density and interconnect pitch over other 3D approaches and is arguably the most flexible active-on-active silicon stacking technology in the world. With these smaller through-silicon-vias and the direct copper-to-copper connections, this technology consumes less than one-third of the energy per signal of competitive micro bump 3D approaches[2] to enable 2 TB/s of total SRAM-CCD bandwidth. AMD’s 3D chiplet architecture has been carefully engineered to enable the highest bandwidth at the lowest silicon area while using direct copper-to-copper hybrid bonding technology plus TSVs for the die-to-die communication. The architecture and silicon floor plan are also engineered to enable optimized thermal performance. For example, AMD designed the 3D 64MB SRAM over the SRAM cells on the core devices (CCD) to keep thermal density low (over just L3) and avoided overlapping on the CCD. We also enabled Structural silicon for heat escape from the higher density cores or CCDs, thus illustrating how 3D chiplet stacking can be done in a thermally friendly manner. This revolutionary technology is a key part of how AMD will push the envelope in high-performance computing over the coming years.
AMD 3D architecture is enabled by a novel process called Hybrid Bonding, developed with TSMC and leverages their 3D Fabric technology. Hybrid bonding is fundamentally a two-phase bonding approach, where in the first phase, the initial hydrophilic dielectric-to-dielectric bonding is created at room temperature, followed by an annealing step where activated dangling bonds of functional groups are covalently bonded. The second phase is direct copper-to-cooper bonding enabled by the same (or a subsequent) anneal step and the copper bonds are formed by solid-state diffusion.
Solder-based micro-bump technology with tall TSVs (that other processor manufacturers use), is based on traditional solder-based packaging technologies and can scale from 50u to 36u (maybe a bit lower, which is ok for low bandwidth applications). AMD’s 3D chiplet architecture, compared to micro-bump technology, uses silicon fabrication like manufacturing methods with back-end design rule-based TSVs (with copper-only interconnects) without the presence of solder. This is a transformational point in the industry’s advanced packaging journey, where interconnect technologies are now being enabled using silicon fabrication-based techniques to enable extreme bandwidth architectures. As a result of this extreme scaling, we are also able to achieve more than three times the interconnect energy efficiency, more than 15x the interconnect density, as well as better signal and power performance compared to competitive micro bump 3D architectures.
Many popular games today have intense demands for the PC’s memory subsystem, resulting in a deep pool of memory on the processor. As we continue to test AMD 3D chiplet prototypes against a long list of games, we’re seeing an average improvement of 15% at 1080p resolution[3]. This 15% improvement is an entire architectural generation’s worth of gaming performance just from the 3D chiplet technology and illustrates the power of advanced packaging technologies. We are getting similar performance benefits on technical compute workloads, some of which were detailed at last year’s AMD Accelerated Data Center event on November 8th.
However, 3D cache stacking over CPU cores is just the beginning of the AMD 3D package journey. The future of 3D stacking is a function of TSV pitch and can spawn many architectural innovations including IP-on-IP stacking, to macro-on-macro stacking, to IP folding/splitting, and circuit level slicing.
2.5D package innovations remain critical for enabling advanced chiplet architectures, and our new 2.5 “Bridge” architecture- Elevated Fanout Bridge (EFB) in the AMD Instinct™ MI200 accelerators demonstrates another AMD advancement in this space. When connecting to HBM, higher density micro-bumps are required for the high signal densities. Traditionally, a silicon interposer was deployed with micro-bumps to support the high-density interconnect. This approach requires a large silicon substrate to support the entire silicon plus HBM assembly above. It must include TSVs to connect the compute die to the signals in the silicon interposer.
An alternative to the interposer approach is Substrate-embedded 2.5D Bridge architectures. These provide localized interconnects and have better electricals than Si interposer designs, due to lack of TSV. However, it is limited in many other ways by a complex manufacturing flow and requirements for a cavity in the organic package substrate to make room for the silicon bridge.
With EFB 2.5D bridge innovation, AMD is getting the electrical and benefits of the bridge approach while avoiding the complexity and expense of carving out the cavities in the substrate. AMD also receives better placement precision by using a wafer level process leveraging a lithographically defined architecture that is much more scalable than the substrate embedded 2.5D. This approach allows us to use standard substrates and standard flip chip processes which have lower complexity and reduced capacity challenges in substrate manufacturing, as well as bumping and assembly processes.
3D stacking technology progression, along with other advanced packaging techniques like EFB, will enable “Beyond Moore’s Law” scaling this decade and support complex heterogeneous integration schemes not possible even with monolithic designs. Welcome to the future!
[1]EPYC-26. Based on calculated areal density and based on bump pitch between AMD hybrid bond AMD 3D V-Cache stacked technology compared to AMD 2D chiplet technology and Intel 3D stacked micro-bump technology.
[2] EPYC-27 Based on AMD internal simulations and published Intel data on “Foveros” technology specifications.
[3] Testing by AMD performance labs as of April 28, 2021, based on the average FPS of 32 PC games at 1920x1080 with the High image quality preset using an AMD Ryzen™ 9 5900X processor vs. 12-Core 3D Chiplet Prototype. Results may vary. R5K-078.