The Increasingly Inefficient Memory Requirements of Machine Learning Workloads
|
NEWS
|
A number of issues regarding the implementation and use of Machine Learning (ML) models should worry everyone. Model complexity doubles roughly every 3 months, placing compute and memory capabilities under immense pressure. Increasingly complex models will require ever bigger data centers, where many of these models are trained, full to the brim with servers working at full speed, with clear and increasingly costly repercussions for the environment. And, by 2025, players like OpenAI are expecting that half or more of all modeling data, as well as the servers to train ML models, will be based in the cloud, which means ever bigger data centers and yet more demand for bigger compute and memory capabilities (and, importantly, more emissions for the environment). Prior ABI Insights discuss the potential benefits of neuromorphic models of computation to ease memory loads, but solutions within conventional compute systems are also available in the market today. Enter Astera Labs and its Leo Memory Connectivity Platform, which, according to Astera Labs, is the industry’s first memory controller, and can be used for the more efficient use of memory resources.
Astera Labs and AMD Team Up to Expand Memory Capacity
|
IMPACT
|
Semiconductor company Astera Labs proposes a purpose-built hardware-based solution meant to tackle bandwidth, capacity, and performance bottlenecks by employing a smart memory controller capable of expanding and pooling up to 2 Terabytes (TB) of memory. Not just a matter of increasing memory capacity, however, but also an attempt to use memory resources more effectively and efficiently, the platform exhibits a composable disaggregation infrastructure in order to bring about three kinds of improvements upon memory capacity: expansion of memory, the pooling of memory resources among different ML processes, and the sharing of overall memory capacity by various ML processes. Two main platforms are on offer to this end: 1) the E series, which provides memory expansion, and 2) the P series, which, in addition to memory expansion, also offers memory pooling and sharing.
Some of the results of the approach can be seen in the market valuation of Astera Labs, topping US$3 billion after a Series B funding in 2022, as well as in some of the partnerships it has fostered, most notably with AMD. In the case of the latter, Astera Labs aims to deploy AMD’s 4th generation EPYC processors to realize the promise of Compute Express Link (CXL)—namely, the high-speed connections between Central Processing Units (CPUs) and devices, on the one hand, CPUs and memory, on the other, with high-performance data center computers naturally in mind. The idea is to combine Astera’s Leo Memory Connectivity Platform with the AMD EPYC 9004 Series processors in order to enable plug-and-play connectivity in new composable architectures, all of which would be powered by CXL technology. The explicit goal is to improve data center operations by reducing memory bandwidth and capacity bottlenecks, thereby driving lower Total Cost of Ownership (TCO).
Encouraging, Given the Ever Increasing Complexity of Machine Learning, but More Is Needed
|
RECOMMENDATIONS
|
One of the maladies of the ML market is the belief that bigger is always better, that to scale up the models will always result in better models, and thus, better market opportunities. There is certainly some truth to this belief; ML is an attempt to construct the best mathematical function for a given dataset, much as the models of a regression do, and it is in the very nature of ML to construct a better function for any dataset the more data and parameters there are—and with better models there are better inferences, which are the actual tools that can be applied and monetized. ABI Research lauds the efforts like those of Astera Labs to streamline memory resources in data center computers and recommends that other vendors follow suit. As more data and models move to the cloud, it is imperative to pay more attention to what takes place in such data centers. The case of Large Language Models (LLMs), such as ChatGPT, is a case in point—the more real, human data they are fed, the better they are at mimicking human language. Standard interconnect technologies, such as PCI Express and CXL technology, will be critical to helping end users overcome memory bottlenecks in ML infrastructure. ML training in the future will be streamlined through and through with hardware- and software-based technology, from data preparation to compute and memory capacity.
Unfortunately, ML models have many limitations. For one, they feature sexist and racist biases, as well as the lack of regional knowledge. One of the solutions to avoid this situation has been to manually label loads of data, which have been outsourced to workers in sweatshop conditions, which is another issue that ML vendors must be aware of. More importantly, however, training LLMs comes at an incredible cost in energy and memory. With increasingly worrying concerns about climate change, the situation can easily become unjustifiable and, indeed, it is likely to bring in new regulations and constraints. ABI Research also recommends that vendors take a wider view of the issues, moving data and model training to the cloud is often defended in terms of efficiency and lower costs, but these benefits pertain to the operations of individual vendors and do not take into consideration the overall costs of having to build more and even bigger data centers. Again, more efficient ways to perform compute, data transfer, and storage will create a more sustainable approach to ML development and commercialization.