Menu

AMD Expands Windows AI Limits With 128B Parameter Model Capability

AMD Upgrades Ryzen AI Max+ With 128B Parameter Model

Analyst(s): Olivier Blanchard
Publication Date: August 13, 2025

AMD’s Variable Graphics Memory upgrade for the Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows with vision and MCP support, bringing data center-level AI to thin, light devices.

What is Covered in this Article:

  • AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows.
  • Integration with Vulkan llama.cpp and LM Studio supports advanced vision and MCP capabilities.
  • Expanded context length up to 256,000 tokens supports complex, token-heavy workflows.
  • The upgrade positions AMD as the only vendor offering full-stack, cloud-to-client AI workload capability in thin and light systems.

The News: AMD has rolled out a significant update to its Variable Graphics Memory (VGM) tech, letting the Ryzen AI Max+ 395 (128GB) processor handle large language models with up to 128 billion parameters in Vulkan llama.cpp on Windows. This update will be part of the upcoming Adrenalin Edition 25.8.1 WHQL drivers, giving thin and light Windows systems the ability to use up to 96GB of VGM for AI tasks.

With this upgrade, the Ryzen AI Max+ 395 becomes the first Windows AI PC processor to run Meta’s Llama 4 Scout 109 billion (17 billion active) with full vision and Model Context Protocol (MCP) support. It solidifies AMD’s position as a go-to platform for running models from 1 billion to 128 billion parameters locally through llama.cpp.

AMD Expands Windows AI Limits With 128B Parameter Model Capability

Analyst Take: This jump in AMD’s Variable Graphics Memory marks a significant inflection point for running AI locally on Windows, as it breaks past limits that once limited workloads of that size to data center environments. By enabling 96GB of dedicated graphics memory on the Ryzen AI Max+ 395, AMD makes it possible to run anything from small assistants to massive vision models, all while supporting the long context lengths needed by advanced AI agentic solutions.

Expanded Model Capacity

Now able to handle up to 128 billion parameter models, the Ryzen AI Max+ 395 can run tasks requiring significant VRAM blocks without falling back on slower shared memory. This includes heavyweight models like Mistral Large and Llama 4 Scout (which must load the full 109 billion parameters even if only 17 billion are active once).

These models’ ability to achieve speeds of up to 15 tokens per second is proof that Ryzen AI Max+ 395 can handle large on-device AI workloads smoothly. It further closes the gap between consumer-class and enterprise-class AI hardware by bringing massive model sizes to devices with more portable form factors.

Context Length Advantage

The chip supports context lengths up to 256,000 tokens with Flash Attention ON and KV Cache Q8, making it ideal for token-heavy work. AMD’s demos – which include summarizing its quarterly reports through MCP or processing entire ARXIV research papers – demonstrate that it can handle over 19,000 and 21,000 tokens (respectively) in a single run. The Ryzen AI Max+ 395 can enable a PC to keep deep conversation history and context going across long sessions, as well as enable advanced agent workflows, and do so securely (as in: inside of an enterprise firewall) and without the need to consume cloud-based resources.

Flexible Quantization and Precision

The upgrade works with a wide range of quantization formats in llama.cpp and GGUF, from lightweight Q4 K M setups to high-precision 16-bit models. While Q4 K M is great for general use, it can also run Q6 or Q8 for tasks that need extra accuracy, like coding or vision-based inference. The architecture even supports running models like Google Gemma 3 27B in FP16, using the high memory bandwidth of the Strix Halo platform. This flexibility ensures users can balance speed, accuracy, and model size to match specific workload demands without hardware limitations.

MCP and Agentic Workflow Enablement

With full MCP and tool calling support when paired with compatible software, the Ryzen AI Max+ 395 presents as a critical and timely missing link in the effort to enable on-device AI agents. It can also handle the extra token load from MCP documentation and tool call returns, which can easily add tens of thousands of tokens. This keeps multi-step, multi-tool workflows stable and responsive – an important distinction against smaller systems that would struggle with that kind of workload. As MCP gains traction with developers like Meta, Google, and Mistral, this processor is set to become a strong local hub for complex, context-rich AI operations. Lastly, as the AI ecosystem begins to transition from primarily a training focus to the scaling of inference, processors like the Ryzen AI Max+ 395 provide PC users with a solid future-proofing foundation that ensures maximum sustained ROI from their PC investment, even this early in the refresh cycle.

What to Watch:

  • Adoption of AMD’s Variable Graphics Memory upgrade by developers targeting large-scale local LLM and VLM deployments.
  • User uptake of extended context length capabilities for MCP-driven, token-heavy workflows.
  • Performance scaling across various quantization levels and precision modes in real-world scenarios.
  • Competitive responses from other vendors offering local AI model execution on Windows devices.

See the complete announcement on AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 on the AMD website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

AMD Q2 FY 2025 Sales Beat Offset by MI308-Linked EPS Decline

AI PCs, Ryzen, and the Next Frontier in Personal Computing – Six Five On The Road

AMD Expands Telecom Role as Nokia Selects EPYC for 5G Cloud Platform

Image Credit: AMD

Author Information

Olivier Blanchard

Olivier Blanchard is Research Director, Intelligent Devices. He covers edge semiconductors and intelligent AI-capable devices for Futurum. In addition to having co-authored several books about digital transformation and AI with Futurum Group CEO Daniel Newman, Blanchard brings considerable experience demystifying new and emerging technologies, advising clients on how best to future-proof their organizations, and helping maximize the positive impacts of technology disruption while mitigating their potentially negative effects. Follow his extended analysis on X and LinkedIn.

Related Insights
ServiceNow Bets on OpenAI to Power Agentic Enterprise Workflows
January 23, 2026

ServiceNow Bets on OpenAI to Power Agentic Enterprise Workflows

Keith Kirkpatrick, Research Director at Futurum, examines ServiceNow’s multi-year collaboration with OpenAI, highlighting a shift toward agentic AI embedded in core enterprise workflows....
Is Tesla’s Multi-Foundry Strategy the Blueprint for Record AI Chip Volumes
January 22, 2026

Is Tesla’s Multi-Foundry Strategy the Blueprint for Record AI Chip Volumes?

Brendan Burke, Research Director at Futurum, explores how Tesla’s dual-foundry strategy for its AI5 chip enables record production scale and could make multi-foundry production the new standard for AI silicon....
January 21, 2026

AI-Enabled Enterprise Workspace – Futurum Signal

The enterprise workspace is entering a new phase—one shaped less by device refresh cycles and more by intelligent integration. As AI-enabled PCs enter the mainstream, the real challenge for IT...
AWS European Sovereign Cloud Debuts with Independent EU Infrastructure
January 16, 2026

AWS European Sovereign Cloud Debuts with Independent EU Infrastructure

Nick Patience, AI Platforms Practice Lead at Futurum, shares his/her insights on AWS’s launch of its European Sovereign Cloud. It is an independently-run cloud in the EU aimed at meeting...
Synopsys and GlobalFoundries Reshape Physical AI Through Processor IP Unbundling
January 16, 2026

Synopsys and GlobalFoundries Reshape Physical AI Through Processor IP Unbundling

Brendan Burke, Research Director at Futurum, evaluates GlobalFoundries’ acquisition of Synopsys’ Processor IP to lead in specialized silicon for Physical AI. Synopsys pivots to a neutral ecosystem strategy, prioritizing foundation...
Qualcomm Unveils Future of Intelligence at CES 2026, Pushes the Boundaries of On-Device AI
January 16, 2026

Qualcomm Unveils Future of Intelligence at CES 2026, Pushes the Boundaries of On-Device AI

Olivier Blanchard, Research Director at Futurum, shares his/her insights on Qualcomm’s CES 2026 announcements, which highlight both the breadth of Qualcomm’s Snapdragon and Dragonwing portfolios, and the velocity with which...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.