NVIDIA's MGX to Supercharge Enterprise Data Centers, Targeting AI, HPC, and Omniverse Applications
|
NEWS
|
Traditional data centers were built to address increasing workload demands by simply adding new servers and Central Processing Unit (CPU) cores. However, the current demands of data-intensive workloads make this practice almost impossible to sustain. The introduction of intelligent accelerators has helped data center operators optimize server performance, while lowering operating costs due to lower power consumption by offloading network functions from the CPUs to accelerators, such as Data Processing Units (DPUs).
At COMPUTEX 2023, NVIDIA unveiled the NVIDIA MGX, a modular reference architecture that enables data center operators to quickly and cost-effectively build more than 100 server variations that optimize various Artificial Intelligence (AI), High-Performance Computing (HPC), and Omniverse applications. The modular nature of MGX enables enterprises to build the foundation for accelerated computing by selecting the Graphics Processing Units (GPUs), DPUs, and CPUs that address various workloads, including AI training, Machine Learning (ML), Large Language Models (LLMs), edge computing, and 5G.
Early industry collaborators include QCT and Supermicro, with QCT deploying MGX architecture in its S74G-2U system, while Supermicro will be integrating NVIDIA’s Grace CPU Superchip MGX architecture in its ARS-221 GL-NR system, a high-density 2U server designed for HPC and LLM applications. MGX architecture will also be rolled out to multiple SoftBank data centers, focusing on allocating GPU resources between generative AI and 5G applications.
Mix and Match: Open Server Architecture Allows MGX to Host Multiple Accelerators Alongside CPU Cores
|
IMPACT
|
The core of MGX’s modular architecture is similar to the concept of the disaggregated data center, where the approach is to break down large monolithic hardware into smaller, more agile components, designed and optimized for specific processes or workloads. While the network, storage, and compute functions are designed to work together in a data center, these components are managed independently from each other. There are many benefits to the disaggregated data center concept, including improved utilization rate, increased scalability, flexibility, etc. Interoperability and complexity are two main challenges that arise from this design architecture.
NVIDIA’s MGX architecture aims to solve the interoperability and complexity challenge by providing DPUs, GPUs, and CPUs that are interoperable, packaged together with a full software stack, all wrapped up in a single, optimized server in various form factors. This ensures that customers reap the benefits of a modular architecture without having to deal with the complexities of system design challenges, component compatibility, latency management, etc.
While it remains early days yet to measure the impact of NVIDIA’s MGX architecture, Original Equipment Manufacturers (OEMs) like QCT and Supermicro can benefit from adopting an architecture that allows different configurations of CPUs, GPUs, and DPUs to cater to different accelerated computing workload processes, all in a single platform. This reduces design and research for the OEMs and provides faster time to market. The savings can then be transferred to enterprises that are exploring AI use cases and might not want to invest huge amounts of monetary and human resources in executing this project.
Data Centers Need to Consider Accelerators to Keep Up with Current Workload Demands
|
RECOMMENDATIONS
|
ABI Research foresees two distinct sets of customers that can benefit from NVIDIA’s MGX open server architecture:
- OEMs: Server manufacturers, such as Dell, Lenovo, HPE, etc., can take advantage of the robust NVIDIA ecosystem and support, enabling these OEMs to design and build accelerated computing systems. Server manufacturers can now package AI-specific GPU accelerators supporting both x86 and Arm processors together with DPUs that offload network functions, housed in various server form factors, and delivered instantly to customers.
- Private Enterprises: The rise of AI, ML, edge computing, the Internet of Things (IoT), etc. has forced enterprises to rethink their technology strategy and digital transformation initiatives to ensure they don’t fall behind. Chief Information Officers (CIOs) and Chief Technology Officers (CTOs) will have to balance the need to embrace new technologies with the cost of technology deployment. NVIDIA’s MGX solution provides the balance of being able to deploy a platform to support AI/ ML workloads almost instantly at a relatively low cost, while having the flexibility of replacing or adding new accelerators as and when there is demand from the business. The multi-generational compatibility of this architecture not only ensures the relevancy of existing designs, but also the seamless adoption of next-generation hardware without having to redesign the entire system.
The MGX open server architecture comes at a time when there is an inflection in technological advancements. Data centers are under pressure to deliver high performance and system flexibility due to customers' rapidly evolving and diverging needs worldwide. There are also calls from the developer community for a common architecture and for the accelerated computing market to work together to ensure open standards. These initiatives will be important to catalyze the growth of the ecosystem and future-proof applications/solutions that have been developed.
ABI Research believes that NVIDIA’s MGX architecture will benefit existing NVIDIA customers, as well as greenfield customers looking to embark on projects that involve AI, ML, machine vision, etc., and looking for solutions to help kick-start the journey. Enterprises that are more advanced in their accelerated computing maturity and have already built and implemented specific system designs can also implement the MGX architecture as a complement to the existing architecture. It is highly unlikely that there will be a mass migration to the MGX architecture, but there will be areas where the MGX architecture can be implemented alongside existing data center designs.