GigaIO & AMD: Enabling computational efficiencies, scale and faster deployments of AI workloads

Jim_Greene · ‎07-29-2024

I always enjoy learning from people who understand how essential collaboration is to innovation. Alan Benjamin, CEO of GigaIO, is one of those people. GigaIO, a company that provides workload-defined infrastructure for AI and technical computing, made news last year by configuring 32 AMD Instinct MI210 accelerators to a single-node server called the SuperNODE. Up to then, accessing 32 GPUs required four servers equipped with eight GPUs apiece — as well as all the cost and latency associated with linking up all of that extra hardware. I recently had the pleasure of speaking to Alan for the AMD EPYC TechTalk podcast series, available here. I've included some of the interview highlights in the blog post below.

The rise of generative AI has boosted the demand for higher-performance computing (HPC) at a time when mind-boggling amounts of information are collected, stored, and analyzed by companies daily. As a result, data centers are under increased pressure to adopt new infrastructures that cater to these higher storage and performance requirements. But installing more expansive HPC systems involves plenty of complexity, often take a long time and can be quite costly. Integrating or cobbling these systems together could also create choke points that slow the solution utilization and response time.

Headquartered in Carlsbad, Calif., GigaIO provides a system for scaling accelerator technologies that eliminate the higher costs, power consumption and latency that come with multi-CPU systems. In addition to SuperNode, GigaIO offers FabreX, the dynamic memory fabric that composes rack-scale resources. Through a disaggregated composable infrastructure (DCI), GigaIO enables data centers to free compute and storage resources and share them across a cluster.

In addition to helping companies squeeze more value out of their computing resources, GigaIO has worked hard to provide something perhaps even more valuable than high performance.

"Maybe even more important than absolute performance is how easy and simple it is to set up and administer systems that go fast," Alan said. "We've had a number of the companies that are working in the inferencing space and in the augmented-training space that are desirous of having an easy methodology for being able to scale up their solution and have come to us and we tell them 'It will just work. You can just drop your existing container on a SuperNODE and you will enjoy the benefit of more GPU's.'"

To make good on the it just works promise, GigaIO teamed with AMD to engineer SuperNODE's hardware and software stack to include the TensorFlow and PyTorch libraries. Applications will run on SuperNODE without being rewritten.

"Those optimized containers (on the AMD Infinity Hub website) that are optimized literally for servers that have four or eight GPUs, you can drop them onto a SuperNODE with 32 GPUs and they will just run," Alan said. "In most cases you will get either 4x or close to 4x, the performance advantage."

GigaIO grew out of the need by engineering and scientific communities for HPC. The computation requirements in these sectors were based on CPUs and were just starting to rely more on GPUs. That triggered what has become an insatiable hunger for more GPUs and the race to tie together larger groups of GPUs began.

In terms of where the HPC market is headed, Alan said that AI and large language models have of course generated a lot of growth. But more recently GigaIO has seen momentum in the augmentation area — companies employing AI to enhance human performance. Business leaders now seek to leverage AI for everyday, practical use.

To achieve this, companies still need foundational models, but want to augment those models with their own data, in what Alan called "a retraining and fine-tuning process."

Looking back on his company's success at shattering the 8-GPU server limit, something many were skeptical could be done, Alan says GigaIO's collaboration with AMD proved to be a critical factor.

To illustrate his point, Alan told the story about how Dr. Moritz Lehmann tested SuperNODE last year on a computational fluid dynamic package designed to simulate airflow over the wings of the Concord at landing speed. After receiving access to SuperNODE, Lehmann didn't need to change a line of code and built his model within 32 hours. Using conventional hardware and relying on 8 GPUs, Alan estimated the task would have required more than a year.

"It was a wonderful example of the power of AMD'S GPUs combined with AMD CPUs," Alan said. "It's been this type of collaboration that's been an iterative process. [Both companies have] done a very nice job working at an engineering level to be able to identify and solve technical problems."