Menu

AMD Expands Windows AI Limits With 128B Parameter Model Capability

AMD Upgrades Ryzen AI Max+ With 128B Parameter Model

Analyst(s): Olivier Blanchard
Publication Date: August 13, 2025

AMD’s Variable Graphics Memory upgrade for the Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows with vision and MCP support, bringing data center-level AI to thin, light devices.

What is Covered in this Article:

  • AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows.
  • Integration with Vulkan llama.cpp and LM Studio supports advanced vision and MCP capabilities.
  • Expanded context length up to 256,000 tokens supports complex, token-heavy workflows.
  • The upgrade positions AMD as the only vendor offering full-stack, cloud-to-client AI workload capability in thin and light systems.

The News: AMD has rolled out a significant update to its Variable Graphics Memory (VGM) tech, letting the Ryzen AI Max+ 395 (128GB) processor handle large language models with up to 128 billion parameters in Vulkan llama.cpp on Windows. This update will be part of the upcoming Adrenalin Edition 25.8.1 WHQL drivers, giving thin and light Windows systems the ability to use up to 96GB of VGM for AI tasks.

With this upgrade, the Ryzen AI Max+ 395 becomes the first Windows AI PC processor to run Meta’s Llama 4 Scout 109 billion (17 billion active) with full vision and Model Context Protocol (MCP) support. It solidifies AMD’s position as a go-to platform for running models from 1 billion to 128 billion parameters locally through llama.cpp.

AMD Expands Windows AI Limits With 128B Parameter Model Capability

Analyst Take: This jump in AMD’s Variable Graphics Memory marks a significant inflection point for running AI locally on Windows, as it breaks past limits that once limited workloads of that size to data center environments. By enabling 96GB of dedicated graphics memory on the Ryzen AI Max+ 395, AMD makes it possible to run anything from small assistants to massive vision models, all while supporting the long context lengths needed by advanced AI agentic solutions.

Expanded Model Capacity

Now able to handle up to 128 billion parameter models, the Ryzen AI Max+ 395 can run tasks requiring significant VRAM blocks without falling back on slower shared memory. This includes heavyweight models like Mistral Large and Llama 4 Scout (which must load the full 109 billion parameters even if only 17 billion are active once).

These models’ ability to achieve speeds of up to 15 tokens per second is proof that Ryzen AI Max+ 395 can handle large on-device AI workloads smoothly. It further closes the gap between consumer-class and enterprise-class AI hardware by bringing massive model sizes to devices with more portable form factors.

Context Length Advantage

The chip supports context lengths up to 256,000 tokens with Flash Attention ON and KV Cache Q8, making it ideal for token-heavy work. AMD’s demos – which include summarizing its quarterly reports through MCP or processing entire ARXIV research papers – demonstrate that it can handle over 19,000 and 21,000 tokens (respectively) in a single run. The Ryzen AI Max+ 395 can enable a PC to keep deep conversation history and context going across long sessions, as well as enable advanced agent workflows, and do so securely (as in: inside of an enterprise firewall) and without the need to consume cloud-based resources.

Flexible Quantization and Precision

The upgrade works with a wide range of quantization formats in llama.cpp and GGUF, from lightweight Q4 K M setups to high-precision 16-bit models. While Q4 K M is great for general use, it can also run Q6 or Q8 for tasks that need extra accuracy, like coding or vision-based inference. The architecture even supports running models like Google Gemma 3 27B in FP16, using the high memory bandwidth of the Strix Halo platform. This flexibility ensures users can balance speed, accuracy, and model size to match specific workload demands without hardware limitations.

MCP and Agentic Workflow Enablement

With full MCP and tool calling support when paired with compatible software, the Ryzen AI Max+ 395 presents as a critical and timely missing link in the effort to enable on-device AI agents. It can also handle the extra token load from MCP documentation and tool call returns, which can easily add tens of thousands of tokens. This keeps multi-step, multi-tool workflows stable and responsive – an important distinction against smaller systems that would struggle with that kind of workload. As MCP gains traction with developers like Meta, Google, and Mistral, this processor is set to become a strong local hub for complex, context-rich AI operations. Lastly, as the AI ecosystem begins to transition from primarily a training focus to the scaling of inference, processors like the Ryzen AI Max+ 395 provide PC users with a solid future-proofing foundation that ensures maximum sustained ROI from their PC investment, even this early in the refresh cycle.

What to Watch:

  • Adoption of AMD’s Variable Graphics Memory upgrade by developers targeting large-scale local LLM and VLM deployments.
  • User uptake of extended context length capabilities for MCP-driven, token-heavy workflows.
  • Performance scaling across various quantization levels and precision modes in real-world scenarios.
  • Competitive responses from other vendors offering local AI model execution on Windows devices.

See the complete announcement on AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 on the AMD website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

AMD Q2 FY 2025 Sales Beat Offset by MI308-Linked EPS Decline

AI PCs, Ryzen, and the Next Frontier in Personal Computing – Six Five On The Road

AMD Expands Telecom Role as Nokia Selects EPYC for 5G Cloud Platform

Image Credit: AMD

Author Information

Olivier Blanchard

Olivier Blanchard is Research Director, Intelligent Devices. He covers edge semiconductors and intelligent AI-capable devices for Futurum. In addition to having co-authored several books about digital transformation and AI with Futurum Group CEO Daniel Newman, Blanchard brings considerable experience demystifying new and emerging technologies, advising clients on how best to future-proof their organizations, and helping maximize the positive impacts of technology disruption while mitigating their potentially negative effects. Follow his extended analysis on X and LinkedIn.

Related Insights
Cisco Q2 FY 2026 Earnings- AI Infrastructure Momentum Lifts Results
February 13, 2026

Cisco Q2 FY 2026 Earnings: AI Infrastructure Momentum Lifts Results

Futurum Research analyzes Cisco’s Q2 FY 2026 results, highlighting AI infrastructure momentum, campus networking demand, and margin mitigation plans, with guidance reaffirming a strong FY 2026 outlook....
Astera Labs Q4 2025 Earnings Diversified AI Connectivity Momentum
February 13, 2026

Astera Labs Q4 2025 Earnings: Diversified AI Connectivity Momentum

Brendan Burke, Research Director at Futurum, analyzes Astera Labs’ Q4 2025 beat and above-consensus guidance, highlighting momentum in smart fabrics, signal conditioning, and CXL memory as AI connectivity spend accelerates....
ServiceNow Buys Pyramid Does this Spell the End of the BI Dashboard
February 13, 2026

ServiceNow Buys Pyramid: Does this Spell the End of the BI Dashboard?

Brad Shimmin, VP and Practice Lead at Futurum, along with Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Digital Workflows, analyze ServiceNow’s acquisition of Pyramid Analytics. They explore...
Silicon Labs Q4 FY 2025 Earnings Highlight Wireless Momentum and Acquisition
February 13, 2026

Silicon Labs Q4 FY 2025 Earnings Highlight Wireless Momentum and Acquisition

Brendan Burke, Research Director at Futurum, analyzes Silicon Labs’ Q4 FY 2025 results and TI’s pending acquisition, highlighting industrial wireless momentum, manufacturing synergies, and how internalized production could expand reach...
T-Mobile Q4 FY 2025 Results Highlight Broadband and Digital Scale
February 13, 2026

T-Mobile Q4 FY 2025 Results Highlight Broadband and Digital Scale

Futurum Research analyzes T-Mobile’s Q4 FY 2025 results, focusing on account-based growth, broadband momentum, and AI-driven network experiences that underpin multi-year service revenue and Core Adjusted EBITDA expansion....
Does Nebius’ Acquisition of Tavily Create the Leading Agentic Cloud
February 12, 2026

Does Nebius’ Acquisition of Tavily Create the Leading Agentic Cloud?

Brendan Burke, Research Director at Futurum, explores Nebius’ acquisition of Tavily to create a unified "Agentic Cloud." By integrating real-time search, Nebius is addressing hallucinations and context gaps for autonomous...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.