Alliances Form to Oppose NVIDIA’s Stranglehold on AI Systems—Will Fragmentation Play into NVIDIA’s Hand?
By Paul Schell |
08 Oct 2024 |
IN-7563
Log In to unlock this content.
You have x unlocks remaining.
This content falls outside of your subscription, but you may view up to five pieces of premium content outside of your subscription each month
You have x unlocks remaining.
By Paul Schell |
08 Oct 2024 |
IN-7563
Industry Initiatives with Buy-in |
NEWS |
The past year has seen several initiatives convene to target diverse areas of the Artificial Intelligence (AI) hardware and software stacks where NVIDIA’s head start has enabled it to monopolize and dominate AI systems. Prominent examples include:
- May 2024, Universal Accelerator Link (UALink): The promoter group backs an open standard for the scale-up of AI accelerators in data centers. Its goal is to standardize hardware across numerous vendors using established Ethernet standards and AMD’s Infinity Fabric’s shared memory protocol. Members include AMD, Broadcom, Intel, Google, and Microsoft.
- July 2023, Ultra Ethernet Consortium (UEC): A group working on the networking stack for the scale-out of AI/High-Performance Computing (HPC) workloads leveraging the widespread Ethernet standard with an established ecosystem of switches, Network Interface Controllers (NICs), and software. The goal is to improve network utilization and address growing AI models that require bigger clusters. Members include AMD, Broadcom, NVIDIA, Eviden, Hewlett Packard Enterprise (HPE), Intel, Meta, and Microsoft.
- September 2023, Unified Acceleration Foundation (UXL): An Intel initiative based on an evolution of oneAPI. The open programming model spans Central Processing Units (CPUs), Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and accelerators, aiming to alleviate migration out of CUDA for more hardware competition. Members include Qualcomm, Fujitsu, Google, and Imagination Technologies. A notable absence is AMD.
- October 2023, AI Platform Alliance: An Ampere Computing-led initiative to promote alternative architectures to GPUs (read NVIDIA and, to a lesser extent, AMD) and make AI platforms more open, efficient, and sustainable. Total Cost of Ownership (TCO), and the power/cost efficiency of hardware for AI to outcompete GPUs, is at the center. Members include Cerebras, Graphcore, Kinara, and Rebellions.
- July 2021, Triton: OpenAI’s software project enables code to run software across different AI accelerators. It simplifies efforts to modify code, improving portability from CUDA. Several major chipset vendors, including Intel, AMD, and Qualcomm, form part of the effort, although it is still in a relatively immature stage.
There are several things to note. First, although NVIDIA is a member of UEC, the body’s work competes with the former’s proprietary InfiniBand networking specification. Nonetheless, NVIDIA also offers Ethernet-based scale-out via its Spectrum-X networking platform, so it pays to be aligned with any progress should the industry prioritize open standards. Second, AMD’s ROCm, an alternative to the oneAPI initiative led by Intel, has likely precluded it from joining the UXL alliance, lest its own investments and promotion of its in-house GPU software stack are deemed unsuccessful or wasted. AMD has, however, made significant progress within Triton, with hardware support since last year. Third, the AI Platform Alliance is led by Ampere, an Arm IP cloud CPU vendor, which competes with Intel and AMD’s x86-based cloud CPUs. All are competing with NVIDIA in data center AI, although not all are targeting training solutions.
Does Fragmentation Benefit NVIDIA? |
IMPACT |
When considering NVIDIA’s leading position on both hardware and software, there are three key areas of potential fragmentation forming regarding computing standards linked to AI-accelerated infrastructure.
- Scale-out and Networking: NVIDIA’s InfiniBand, which competes primarily with any Ethernet-based networking protocols, such as those promoted by UEC.
- Scale-up: NVIDIA’s NVLink, which will compete with the emerging UALink in the interconnection of 1,000+ GPU clusters working in unison.
- Accelerator Software: NVIDIA’s CUDA, which will compete with UXL’s work, AMD’s ROCm stack, and OpenAI’s Triton.
- General Compute Hardware: the AI Platform Alliance, formed around Ampere’s Arm-based cloud-native CPUs for inferencing workloads, is incongruent with any x86-based open standards, as Intel and AMD would not back Ampere’s efforts to unite Arm CPUs (that compete with x86) and diverse AI Application-Specific Integrated Circuits (ASICs). Both Ampere and x86-based CPUs compete with NVIDIA’s GPUs for inference workloads.
On the hardware side, there is potential for fragmentation between market leader NVIDIA and these emerging standards because they are the de facto alternative to NVIDIA’s proprietary solutions. The drive for open platforms and standards is, in large part, driven by hyperscalers and frontier AI developers (e.g., OpenAI) seeking to increase competition and drive down the cost of capital-intensive AI data center investments—a prominent example being the Open Compute Project (OCP) by Facebook/Meta. This is also linked to the desire for increased flexibility and portability, therefore eschewing vendor lock-in. Thus, we see the potential for divergent paths between infrastructure adhering to UALink and UEC, and those using NVIDIA’s NVLink and InfiniBand. On scale-up, UALink has the critical mass to oppose NVIDIA, but has yet to release its first specification. On scale-out, UEC is in a similar position, and even includes NVIDIA. Then there is the split between x86-based inferencing solutions, such as Intel’s Xeon 6, and the Arm-based alternatives by Ampere and others, and captive vendors Microsoft and Google that have their own Arm-based CPUs for AI.
The potential for software fragmentation is also present, which is where NVIDIA’s position is arguably more entrenched. Many developers were tempted to move to CUDA by the vast number of domain-specific libraries to take advantage of accelerated NVIDIA hardware, requiring little to no extra work. This started long before alternative platforms like Intel’s Gaudi and AMD’s Instinct reached maturity. NVIDIA is now said to invest more in software than hardware engineering talent, and its Research and Development (R&D) in this area amounts to billions of dollars. Previously, developers would only port their code if there was a compelling reason to do so and, to date, this has been prohibitively expensive. Now, many are compelled to migrate by the TCO benefits of Intel’s Gaudi 3 and AMD’s Instinct which is why UXL and Triton have significant buy-in, as they seek to enable a relatively inexpensive way to migrate out of the costly NVIDIA moat. Here, we could see divergent paths between UXL, which is based on Intel’s oneAPI initiative for heterogenous accelerated computing and does not include AMD as a member; Triton, which includes AMD; and NVIDIA’s CUDA. AMD’s absence from UXL hinders efforts to break the CUDA moat, especially because Triton is behind oneAPI and UXL.
Vendors Should Unite |
RECOMMENDATIONS |
The end goal of open initiatives and standards like UALink, UEC, and UXL is to enable portability and flexibility to promote compute hardware choice for AI deployments in the data center—let the most performant hardware win. Ideally, chip vendors would unite around one standard for each domain for all essential, capital-intensive areas of AI server deployments outside of compute, namely scale-up, scale-out, and software. NVIDIA’s dominance in all of these aspects leaves other vendors like AMD and Intel with no other choice than to coalesce around a single front for each. However, we see the potential for fragmentation. But there are workarounds, such as consolidation, which has occurred in similar efforts, such as the transfer of Gen Z and Open CAPI assets into CXL.
NVIDIA’s head start is worth years and billions of dollars, which no single vendor can break. Uniting efforts will further vendors in their single goal to enable the portability of AI workloads across diverse accelerators, and the flexibility to choose.