The News: On September 26, large language model (LLM) fine-tuning specialist Lamini revealed in a blog post the availability of its LLM Superstation, a graphics processing unit (GPU) compute platform powered by AMD GPUs that is optimized to run private enterprise LLMs. Here are the key details:
- The solution combines Lamini’s enterprise LLM infrastructure with AMD Instinct MI210 and MI250 accelerators.
- Superstation can run Llama 2 70B out of the box. Lamini claims the setup costs 10x less than Amazon Web Services (AWS). (Note that it is unclear what is being compared.)
- LLM Superstation is available now, both in the cloud and on premises. Lamini highlights this availability compared with the current 52-week lead time for NVIDIA H100s.
- Lamini has been running its fine-tuned LLMs on AMD Instinct GPUs “for the past year.”
- In a comparison of AMD’s current accelerators versus NVIDIA, the Instinct m1250x is comparable in compute power and memory to NVIDIA A100 but is not nearly as powerful as the NVIDIA H100.
- Lamini ran tests that show AMD’s ROCm software “provides a solid foundation for high-performance applications like fine-tuning LLMs.”
Read Lamini’s post about the AMD Superstation here.
Read Lamini’s NVIDIA-AMD Oven-Grill post on X here.
AMD: Our GPUs Running LLMs
Analyst Take: LLM workloads are not the exclusive purview of NVIDIA. AMD’s Lamini initiative has impact on the AI compute market. Here is my take.
Comparing Apples to Oranges
The news that the AMD-Lamini LLM Superstation can run LLMs is welcome, but you have to run Lamini fine-tuning, which is designed to reduce the workload – it is unclear whether the Superstation would be able to run an LLM without fine-tuning it. Lamini uses the example of Llama 2 70B as the LLM used, which is an open source model. Will the setup run other LLMs?
The AMD GPUs compare well with NVIDIA’s A100s but are not nearly as powerful as NVIDIA’s H100s. Not apples to oranges.
On the plus side, some developers feel an equalizer for AMD has been an improvement in its GPU software. In Y combinator’s Hacker News, one commentor posted this about the Lamini-AMD initiative:
“The hard part about using any AI Chips other than NVIDIA has been software. ROCm is finally at the point where it can train and deploy LLMs like Llama 2 in production.”
Lamini CTO Greg Diamos, who was an early architect of CUDA at NVIDIA said, “Using Lamini software, ROCm has achieved software parity with CUDA for LLMs. We chose the Instinct MI250 as the foundation for Lamini because it runs the biggest models that our customers demand and integrates fine-tuning optimizations. We use the large HBM capacity (128 GB) on MI250 to run bigger models with lower software complexity than clusters of A100s.”
Availability
The backlog of NVIDIA H100s, Lamini says a 52-week lead time, is an extremely compelling argument for enterprises to consider the AMD-Lamini Superstation. Perhaps it is not as powerful and might not offer the broadest range of LLM options, but it is a system that can be operationalized right now, and with the pace of AI innovation, that is a critical consideration.
Seeking Options To Reduce AI Compute
Along with availability, perhaps the winds of AI compute are set to change anyway. The extraordinarily large AI compute loads necessary to run training and inference for giant LLMs is probably not economically viable or sustainable. The trend has been toward “smaller” LLMs with the fine-tuning being championed by Lamini and others. AI workloads might not require the horsepower NVIDIA’s H100s provide.
Conclusions
Despite some limitations, the AMD-Lamini Superstation is a viable option for enterprises to consider for deploying LLMs. Savvy enterprises that cannot wait a calendar year to run AI workloads will be test-driving the system.
Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.
Other insights from The Futurum Group:
AMD Revenue Hits $5.4 Billion in Q2, Down 18% YoY, But Beats Estimates
Hybrid Cloud Journey: How Nutanix, AMD and HPE Power Modern Apps | Futurum Tech Webcast
Author Information
Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.
Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.