Is Microsoft 365 Copilot Agent Mode Ready to Rival Human Accuracy?

Is Microsoft 365 Copilot Agent Mode Ready to Rival Human Accuracy?

Analyst(s): Nick Patience
Publication Date: October 2, 2025

Microsoft has launched Agent Mode in Excel and Word, alongside Office Agent in Copilot chat. These features allow users to steer Copilot through multi-step tasks, producing spreadsheets, documents, and presentations with higher accuracy and interactivity.

What is Covered in this Article:

  • Agent Mode in Excel adds multi-step orchestration, benchmarked at 57.2% accuracy vs 71.3% for humans.
  • Word gains conversational “vibe writing” for drafting, refining, and formatting documents.
  • Office Agent in Copilot chat (Anthropic-powered) builds PowerPoint and Word through chat-first workflows.
  • Microsoft blends OpenAI and Anthropic models, assigning each to distinct Copilot roles.

The News: Microsoft has rolled out Agent Mode in Excel, Word, and Office Agent in Copilot chat, introducing what it calls “vibe working.” Agent Mode brings multi-step automation to Excel and Word, while Office Agent lets users generate Word and PowerPoint content straight from chat prompts.

Agent Mode is available now for Microsoft 365 Copilot users and Microsoft 365 Personal or Family subscribers through the Frontier program. Excel and Word will be supported on the web, and PowerPoint will be coming soon. Office Agent, powered by Anthropic models, is also launching in the U.S. for Personal and Family subscribers.

Is Microsoft 365 Copilot Agent Mode Ready to Rival Human Accuracy?

Analyst Take: The release of Agent Mode and Office Agent marks a shift in Microsoft’s Copilot strategy, embedding agent-driven workflows directly into tools people use every day. By adding advanced orchestration and reasoning inside Excel, Word, and PowerPoint, Microsoft aims to make powerful functionality more accessible while creating a new way for teams to collaborate.

Expanding Excel Beyond Experts

Excel has long been essential for everything from simple budgets to corporate finance, but its deeper features have mostly been used by experts. Agent Mode changes this by allowing Copilot to “speak Excel” natively through OpenAI’s latest reasoning models. Instead of just generating results, it can check outputs, fix mistakes, and rerun processes until they are correct. Microsoft benchmarked Agent Mode at 57.2% accuracy on SpreadsheetBench – below the 71.3% human score, but ahead of Shortcut.ai and Claude Files. While not flawless, this is a big step in making advanced Excel capabilities usable for non-experts.

Conversational Writing in Word

Agent Mode in Word reimagines document creation as “vibe writing” – a more interactive process where Copilot drafts, refines, and asks questions as you go. Rather than one-off outputs, it stays in conversation, folding in feedback and applying proper formatting. Example uses include summarizing customer reviews, updating reports, or polishing documents to match branding. By mixing drafting with iteration, Microsoft hopes to speed up writing tasks while still keeping users in control.

Office Agent’s Chat-First Creation Workflow

Office Agent brings Copilot into chat to generate complete PowerPoint decks and Word documents using Anthropic models. The process starts by clarifying length, theme, focus areas, and audience details. It then conducts web research with a visible reasoning trail and live slide previews before using code generation and quality checks to deliver polished content. This chat-first setup pulls clarification, research, generation, and revision into one workflow.

Model Composition Across Copilot

Office apps are powered by OpenAI models, with Agent Mode using GPT-5 for step-by-step task execution, while Office Agent in Copilot chat runs on Anthropic models. The company signals an ongoing commitment to OpenAI but is also building a broader model family to match different strengths and needs. Anthropic models are now appearing in Microsoft 365 apps through Copilot chat, showing a clear strategy of mixing providers across the portfolio. In practice, this means selecting models based on their specific role within Copilot, rather than relying on a single source for every task.

What to Watch:

  • User adoption of Agent Mode accuracy at 57.2% versus 71.3% human benchmark.
  • Expansion of Office Agent beyond U.S. Personal/Family subscribers to commercial customers.
  • Integration of PowerPoint into Agent Mode following Excel and Word.
  • The interplay between OpenAI-powered Office apps and Anthropic-powered Copilot chat.

See the complete blog post on the introduction of Agent Mode and Office Agent on the Microsoft website.

Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.

Other insights from Futurum:

Microsoft Q4 FY 2025 Earnings Beat Driven by 39% Azure Growth

Microsoft Reimagines Marketplace: A New Battleground for AI Agents?

What Role Will Microsoft 365 Copilot Agents Play in Enterprise Workflows?

Author Information

Nick Patience is VP and Practice Lead for AI Platforms at The Futurum Group. Nick is a thought leader on AI development, deployment, and adoption - an area he has researched for 25 years. Before Futurum, Nick was a Managing Analyst with S&P Global Market Intelligence, responsible for 451 Research’s coverage of Data, AI, Analytics, Information Security, and Risk. Nick became part of S&P Global through its 2019 acquisition of 451 Research, a pioneering analyst firm that Nick co-founded in 1999. He is a sought-after speaker and advisor, known for his expertise in the drivers of AI adoption, industry use cases, and the infrastructure behind its development and deployment. Nick also spent three years as a product marketing lead at Recommind (now part of OpenText), a machine learning-driven eDiscovery software company. Nick is based in London.

Related Insights
ChatGPT Images 2.0 Raises the Stakes in Enterprise AI—But Will Reliability Keep Pace?
April 23, 2026

ChatGPT Images 2.0 Raises the Stakes in Enterprise AI—But Will Reliability Keep Pace?

OpenAI's ChatGPT Images 2.0 intensifies competition with Microsoft and Google, but enterprise adoption hinges on reliability. Futurum Group's Decision Maker Survey reveals 55% cite AI agent hallucination management as the...
Qodo Hands PR-Agent to the Community: Will Open Governance Accelerate AI Code Review?
April 23, 2026

Qodo Hands PR-Agent to the Community: Will Open Governance Accelerate AI Code Review?

Qodo's transfer of PR-Agent to community ownership marks a pivotal test for open-source AI against proprietary competitors demanding transparency and rapid innovation....
Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?
April 22, 2026

Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?

Qualcomm's Snapdragon Wear Elite marks a turning point in wearable AI, delivering a dedicated neural processing unit for on-device intelligence, privacy, and real-time voice interactions—positioning the company against Apple and...
VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?
April 22, 2026

VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?

Brad Shimmin, Vice President & Practice Lead at Futurum, analyzes VAST Data valuation and its AI operating system strategy, questioning whether unified infrastructure can scale amid persistent market fragmentation....
Cerebras S-1 Teardown: Is the $23B Wafer-Scale IPO the End of GPU Homogeneity?
April 22, 2026

Cerebras S-1 Teardown: Is the $23B Wafer-Scale IPO the End of GPU Homogeneity?

Brendan Burke, Research Director at Futurum, examines Cerebras Systems' S-1 filing and $23B valuation, dissecting the $20B OpenAI deal, 86% UAE revenue concentration, and whether wafer-scale silicon can survive the...
Can CLEAR’s Q1 2026 Results Prove Identity Tech Is More Than a Travel Niche?
April 22, 2026

Can CLEAR’s Q1 2026 Results Prove Identity Tech Is More Than a Travel Niche?

CLEAR's Q1 2026 earnings announcement on May 6 will demonstrate whether its Identity Platform expansion into healthcare and enterprise markets can deliver sustainable growth beyond airport security operations....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.