Memcon 2024: Memory Technologies Are a Key Component to Scale AI

Memcon 2024: Memory Technologies Are a Key Component to Scale AI

In the second year of the Memcon conference, it was all memory and how to feed the “beast,” the GPU. In 2023, the focus was on CXL technology, how and where it would be adopted. Because of the explosion of generative AI, the commentary on types of memory has shifted to high-bandwidth memory (HBM) to address the bandwidth needed to maximize the GPU performance. Overall, there was a general agreement, CXL, while it addressed the capacity, it cannot deliver on the bandwidth needed for the training of generative AI applications. Rather we would see CXL memory modes applied to inference or more likely In Memory Database applications, such as SAP Hana.

In a very related note to generative AI and HBM, the community addressed the pressing issues of designing for the scale that this new AI will require. Those areas are advanced packaging for memory, for instance scaling HBM to 4, 8, 16+ layers. The question is how many layers before the signaling or mechanical design become issues? The second area is scalable networks. Is it PCIe or IB? These are big bets that organizations will need to make in their deployments. The trend seems to be Ethernet, but the high-end systems will probably cling to IB.

The next issue was cooling and energy discussed throughout the two days. The more processors, cores, and memory, the hotter these systems become. The major research labs have turned to liquid cooling. Expect this to become the norm for AI systems as they grow, at least self-contained systems with liquid. Perhaps we will be back with water cooled facilities in the near future.

Methods and techniques being deployed to overcome the memory constraints were also presented. Tejas Chopra of Netflix discussed the gyrations the company’s data scientists and programmers deployed, which spanned model pruning, efficient mini-batch selection, data quantization, and paging. Asked if this goes away with new memory offerings, he stated no. It just advances the capabilities of the current environment. Some of these methods were echoed by Shell (the energy company) and how it broke data into cubes to fit into memory or using more checkpointing and compression.

So what about CXL? Samsung was a major sponsor for the event and used the forum to launch its CXL Memory Module box CMM-B with 2 TB of memory for memory hungry databases and their HBM3M, 12-stack offering. There was some question as to when we are going to see this in real deployment—an understandable position given the 5 years we have been talking about CXL. Well, we are getting there. Very exciting were the partnerships with VMware and Red Hat in regard to joint development.

VMware will be releasing support for Samsung’s CMM-H, which is CXL 2.0 pooled memory. In a release planned for later in 2024, vSphere will support tiered memory. This will enable disaggregated memory to feed core capacity. The result: increased VM density per core, more memory for database-driven apps such as SAP Hana and cluster-wide memory for large-scale environments. Given all the noise on VMware licensing, this might give a bit of relief depending on how the next release is priced.

Red Hat and Samsung had previously announced the qualification of DRAM Memory Module (CMM-D) for pooling of memory with RHEL 9.3.

We are making progress with the memory constraints but do not expect workarounds to disappear. Rather, the applications will continue to consume whatever we can feed them. So back to work memory engineers!

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other Insights from The Futurum Group:

The Six Five Talk Samsung’s Memory Tech Day

Memory Market: Call it a Comeback?

Marvell Industry Analyst Day 2023: Accelerated Computing Takes Off

Author Information

Camberley Bates

Camberley brings over 25 years of executive experience leading sales and marketing teams at Fortune 500 firms. Before joining The Futurum Group, she led the Evaluator Group, an information technology analyst firm as Managing Director.

Her career has spanned all elements of sales and marketing including a 360-degree view of addressing challenges and delivering solutions was achieved from crossing the boundary of sales and channel engagement with large enterprise vendors and her own 100-person IT services firm.

Camberley has provided Global 250 startups with go-to-market strategies, creating a new market category “MAID” as Vice President of Marketing at COPAN and led a worldwide marketing team including channels as a VP at VERITAS. At GE Access, a $2B distribution company, she served as VP of a new division and succeeded in growing the company from $14 to $500 million and built a successful 100-person IT services firm. Camberley began her career at IBM in sales and management.

She holds a Bachelor of Science in International Business from California State University – Long Beach and executive certificates from Wellesley and Wharton School of Business.

SHARE:

Latest Insights:

Brad Shimmin, VP and Practice Lead at The Futurum Group, examines why investors behind NVIDIA and Meta are backing Hammerspace to remove AI data bottlenecks and improve performance at scale.
Looking Beyond the Dashboard: Tableau Bets Big on AI Grounded in Semantic Data to Define Its Next Chapter
Futurum analysts Brad Shimmin and Keith Kirkpatrick cover the latest developments from Tableau Conference, focused on the new AI and data-management enhancements to the visualization platform.
Colleen Kapase, VP at Google Cloud, joins Tiffani Bova to share insights on enhancing partner opportunities and harnessing AI for growth.
Ericsson Introduces Wireless-First Branch Architecture for Agile, Secure Connectivity to Support AI-Driven Enterprise Innovation
The Futurum Group’s Ron Westfall shares his insights on why Ericsson’s new wireless-first architecture and the E400 fulfill key emerging enterprise trends, such as 5G Advanced, IoT proliferation, and increased reliance on wireless-first implementations.

Book a Demo

Thank you, we received your request, a member of our team will be in contact with you.