AMD Expands Windows AI Limits With 128B Parameter Model Capability

AMD Upgrades Ryzen AI Max+ With 128B Parameter Model

Analyst(s): Olivier Blanchard
Publication Date: August 13, 2025

AMD’s Variable Graphics Memory upgrade for the Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows with vision and MCP support, bringing data center-level AI to thin, light devices.

What is Covered in this Article:

  • AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 enables up to 128 billion parameter LLMs on Windows.
  • Integration with Vulkan llama.cpp and LM Studio supports advanced vision and MCP capabilities.
  • Expanded context length up to 256,000 tokens supports complex, token-heavy workflows.
  • The upgrade positions AMD as the only vendor offering full-stack, cloud-to-client AI workload capability in thin and light systems.

The News: AMD has rolled out a significant update to its Variable Graphics Memory (VGM) tech, letting the Ryzen AI Max+ 395 (128GB) processor handle large language models with up to 128 billion parameters in Vulkan llama.cpp on Windows. This update will be part of the upcoming Adrenalin Edition 25.8.1 WHQL drivers, giving thin and light Windows systems the ability to use up to 96GB of VGM for AI tasks.

With this upgrade, the Ryzen AI Max+ 395 becomes the first Windows AI PC processor to run Meta’s Llama 4 Scout 109 billion (17 billion active) with full vision and Model Context Protocol (MCP) support. It solidifies AMD’s position as a go-to platform for running models from 1 billion to 128 billion parameters locally through llama.cpp.

AMD Expands Windows AI Limits With 128B Parameter Model Capability

Analyst Take: This jump in AMD’s Variable Graphics Memory marks a significant inflection point for running AI locally on Windows, as it breaks past limits that once limited workloads of that size to data center environments. By enabling 96GB of dedicated graphics memory on the Ryzen AI Max+ 395, AMD makes it possible to run anything from small assistants to massive vision models, all while supporting the long context lengths needed by advanced AI agentic solutions.

Expanded Model Capacity

Now able to handle up to 128 billion parameter models, the Ryzen AI Max+ 395 can run tasks requiring significant VRAM blocks without falling back on slower shared memory. This includes heavyweight models like Mistral Large and Llama 4 Scout (which must load the full 109 billion parameters even if only 17 billion are active once).

These models’ ability to achieve speeds of up to 15 tokens per second is proof that Ryzen AI Max+ 395 can handle large on-device AI workloads smoothly. It further closes the gap between consumer-class and enterprise-class AI hardware by bringing massive model sizes to devices with more portable form factors.

Context Length Advantage

The chip supports context lengths up to 256,000 tokens with Flash Attention ON and KV Cache Q8, making it ideal for token-heavy work. AMD’s demos – which include summarizing its quarterly reports through MCP or processing entire ARXIV research papers – demonstrate that it can handle over 19,000 and 21,000 tokens (respectively) in a single run. The Ryzen AI Max+ 395 can enable a PC to keep deep conversation history and context going across long sessions, as well as enable advanced agent workflows, and do so securely (as in: inside of an enterprise firewall) and without the need to consume cloud-based resources.

Flexible Quantization and Precision

The upgrade works with a wide range of quantization formats in llama.cpp and GGUF, from lightweight Q4 K M setups to high-precision 16-bit models. While Q4 K M is great for general use, it can also run Q6 or Q8 for tasks that need extra accuracy, like coding or vision-based inference. The architecture even supports running models like Google Gemma 3 27B in FP16, using the high memory bandwidth of the Strix Halo platform. This flexibility ensures users can balance speed, accuracy, and model size to match specific workload demands without hardware limitations.

MCP and Agentic Workflow Enablement

With full MCP and tool calling support when paired with compatible software, the Ryzen AI Max+ 395 presents as a critical and timely missing link in the effort to enable on-device AI agents. It can also handle the extra token load from MCP documentation and tool call returns, which can easily add tens of thousands of tokens. This keeps multi-step, multi-tool workflows stable and responsive – an important distinction against smaller systems that would struggle with that kind of workload. As MCP gains traction with developers like Meta, Google, and Mistral, this processor is set to become a strong local hub for complex, context-rich AI operations. Lastly, as the AI ecosystem begins to transition from primarily a training focus to the scaling of inference, processors like the Ryzen AI Max+ 395 provide PC users with a solid future-proofing foundation that ensures maximum sustained ROI from their PC investment, even this early in the refresh cycle.

What to Watch:

  • Adoption of AMD’s Variable Graphics Memory upgrade by developers targeting large-scale local LLM and VLM deployments.
  • User uptake of extended context length capabilities for MCP-driven, token-heavy workflows.
  • Performance scaling across various quantization levels and precision modes in real-world scenarios.
  • Competitive responses from other vendors offering local AI model execution on Windows devices.

See the complete announcement on AMD’s Variable Graphics Memory upgrade for Ryzen AI Max+ 395 on the AMD website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

AMD Q2 FY 2025 Sales Beat Offset by MI308-Linked EPS Decline

AI PCs, Ryzen, and the Next Frontier in Personal Computing – Six Five On The Road

AMD Expands Telecom Role as Nokia Selects EPYC for 5G Cloud Platform

Image Credit: AMD

Author Information

Olivier Blanchard

Olivier Blanchard is Research Director, Intelligent Devices. He covers edge semiconductors and intelligent AI-capable devices for Futurum. In addition to having co-authored several books about digital transformation and AI with Futurum Group CEO Daniel Newman, Blanchard brings considerable experience demystifying new and emerging technologies, advising clients on how best to future-proof their organizations, and helping maximize the positive impacts of technology disruption while mitigating their potentially negative effects. Follow his extended analysis on X and LinkedIn.

Related Insights
Can Databricks’ Security Upgrades Finally Unify AI Innovation and Compliance at Scale?
June 19, 2026

Can Databricks’ Security Upgrades Finally Unify AI Innovation and Compliance at Scale?

Databricks announces Automatic Identity Management for Entra ID and Okta, removing compliance bottlenecks for regulated industries. New security enhancements enable zero-trust access across all major clouds....
Will PyTorch Certification Reset the AI Talent Benchmark for Enterprises?
June 19, 2026

Will PyTorch Certification Reset the AI Talent Benchmark for Enterprises?

The PyTorch Foundation and Linux Foundation Education launch PyTorch Certification (PTCA) for AI practitioners, establishing a standardized skills benchmark that could reshape how enterprises assess, hire, and upskill talent in...
Slackbot's MCP Client Aims to End App Fragmentation, But Can Slack Outmaneuver Microsoft Teams?
June 18, 2026

Slackbot’s MCP Client Aims to End App Fragmentation, But Can Slack Outmaneuver Microsoft Teams?

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, examines how Slackbot's MCP Client aims to consolidate fragmented software stacks by integrating 20+ partner applications into...
Adobe's Creative Agent Expansion Raises the Bar for AI-Powered Creative Work
June 18, 2026

Adobe’s Creative Agent Expansion Raises the Bar for AI-Powered Creative Work

Keith Kirkpatrick, Vice President & Research Director, Enterprise Software & Di at Futurum, Adobe's Creative Agent expansion shows enterprise shift toward agentic AI, with 51% of organizations using AI for...
Can Glean's Financial Services Push Make AI Assistants a Compliance Asset, Not a Risk?
June 18, 2026

Can Glean’s Financial Services Push Make AI Assistants a Compliance Asset, Not a Risk?

Glean's Financial Services expansion positions its AI Assistant as a compliance-first solution for regulated industries, tackling reliability and privacy concerns while competing against Microsoft and Google in enterprise AI deployment....
Will Shared Memory Become the Missing Link for Enterprise-Scale Multi-Agent AI?
June 18, 2026

Will Shared Memory Become the Missing Link for Enterprise-Scale Multi-Agent AI?

Tabnine's shared memory architecture addresses fragmentation challenges in multi-agent AI development, providing enterprises with consistent, permission-aware context across codebases, documentation, and APIs as agentic AI adoption accelerates....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.