Analyst(s): Olivier Blanchard
Publication Date: August 13, 2025
AMD’s Variable Graphics Memory upgrade for the Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows with vision and MCP support, bringing data center-level AI to thin, light devices.
What is Covered in this Article:
- AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows.
- Integration with Vulkan llama.cpp and LM Studio supports advanced vision and MCP capabilities.
- Expanded context length up to 256,000 tokens supports complex, token-heavy workflows.
- The upgrade positions AMD as the only vendor offering full-stack, cloud-to-client AI workload capability in thin and light systems.
The News: AMD has rolled out a significant update to its Variable Graphics Memory (VGM) tech, letting the Ryzen AI Max+ 395 (128GB) processor handle large language models with up to 128 billion parameters in Vulkan llama.cpp on Windows. This update will be part of the upcoming Adrenalin Edition 25.8.1 WHQL drivers, giving thin and light Windows systems the ability to use up to 96GB of VGM for AI tasks.
With this upgrade, the Ryzen AI Max+ 395 becomes the first Windows AI PC processor to run Meta’s Llama 4 Scout 109 billion (17 billion active) with full vision and Model Context Protocol (MCP) support. It solidifies AMD’s position as a go-to platform for running models from 1 billion to 128 billion parameters locally through llama.cpp.
AMD Expands Windows AI Limits With 128B Parameter Model Capability
Analyst Take: This jump in AMD’s Variable Graphics Memory marks a significant inflection point for running AI locally on Windows, as it breaks past limits that once limited workloads of that size to data center environments. By enabling 96GB of dedicated graphics memory on the Ryzen AI Max+ 395, AMD makes it possible to run anything from small assistants to massive vision models, all while supporting the long context lengths needed by advanced AI agentic solutions.
Expanded Model Capacity
Now able to handle up to 128 billion parameter models, the Ryzen AI Max+ 395 can run tasks requiring significant VRAM blocks without falling back on slower shared memory. This includes heavyweight models like Mistral Large and Llama 4 Scout (which must load the full 109 billion parameters even if only 17 billion are active once).
These models’ ability to achieve speeds of up to 15 tokens per second is proof that Ryzen AI Max+ 395 can handle large on-device AI workloads smoothly. It further closes the gap between consumer-class and enterprise-class AI hardware by bringing massive model sizes to devices with more portable form factors.
Context Length Advantage
The chip supports context lengths up to 256,000 tokens with Flash Attention ON and KV Cache Q8, making it ideal for token-heavy work. AMD’s demos – which include summarizing its quarterly reports through MCP or processing entire ARXIV research papers – demonstrate that it can handle over 19,000 and 21,000 tokens (respectively) in a single run. The Ryzen AI Max+ 395 can enable a PC to keep deep conversation history and context going across long sessions, as well as enable advanced agent workflows, and do so securely (as in: inside of an enterprise firewall) and without the need to consume cloud-based resources.
Flexible Quantization and Precision
The upgrade works with a wide range of quantization formats in llama.cpp and GGUF, from lightweight Q4 K M setups to high-precision 16-bit models. While Q4 K M is great for general use, it can also run Q6 or Q8 for tasks that need extra accuracy, like coding or vision-based inference. The architecture even supports running models like Google Gemma 3 27B in FP16, using the high memory bandwidth of the Strix Halo platform. This flexibility ensures users can balance speed, accuracy, and model size to match specific workload demands without hardware limitations.
MCP and Agentic Workflow Enablement
With full MCP and tool calling support when paired with compatible software, the Ryzen AI Max+ 395 presents as a critical and timely missing link in the effort to enable on-device AI agents. It can also handle the extra token load from MCP documentation and tool call returns, which can easily add tens of thousands of tokens. This keeps multi-step, multi-tool workflows stable and responsive – an important distinction against smaller systems that would struggle with that kind of workload. As MCP gains traction with developers like Meta, Google, and Mistral, this processor is set to become a strong local hub for complex, context-rich AI operations. Lastly, as the AI ecosystem begins to transition from primarily a training focus to the scaling of inference, processors like the Ryzen AI Max+ 395 provide PC users with a solid future-proofing foundation that ensures maximum sustained ROI from their PC investment, even this early in the refresh cycle.
What to Watch:
- Adoption of AMD’s Variable Graphics Memory upgrade by developers targeting large-scale local LLM and VLM deployments.
- User uptake of extended context length capabilities for MCP-driven, token-heavy workflows.
- Performance scaling across various quantization levels and precision modes in real-world scenarios.
- Competitive responses from other vendors offering local AI model execution on Windows devices.
See the complete announcement on AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 on the AMD website.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Other insights from Futurum:
AMD Q2 FY 2025 Sales Beat Offset by MI308-Linked EPS Decline
AI PCs, Ryzen, and the Next Frontier in Personal Computing – Six Five On The Road
AMD Expands Telecom Role as Nokia Selects EPYC for 5G Cloud Platform
Image Credit: AMD
Author Information
Olivier Blanchard is Research Director, Intelligent Devices. He covers edge semiconductors and intelligent AI-capable devices for Futurum. In addition to having co-authored several books about digital transformation and AI with Futurum Group CEO Daniel Newman, Blanchard brings considerable experience demystifying new and emerging technologies, advising clients on how best to future-proof their organizations, and helping maximize the positive impacts of technology disruption while mitigating their potentially negative effects. Follow his extended analysis on X and LinkedIn.
