Under The Hood: How Microsoft Copilot Tames LLM Issues

Under The Hood: How Microsoft Copilot Tames LLM Issues

The News: In September, Microsoft unveiled a series of announcements that will bring AI into some of the most used software applications on the planet, Windows OS and Microsoft 365, as well as Bing and Edge.

The big announcement was Microsoft’s introduction of Copilot. According to press materials, Copilot “will be your everyday AI companion. Copilot will uniquely incorporate the context and intelligence of the web, your work data and what you are doing in the moment on your PC to provide better assistance – with your privacy and security at the forefront. It will be a simple and seamless experience, available in Windows 11, Microsoft 365, and in our web browser with Edge and Bing. It will work as an app or reveal itself when you need it with a right click. We will continue to add capabilities and connections to Copilot across to our most-used applications over time in service of our vision to have one experience that works across your whole life.”

Microsoft 365 Copilot will generally be available to commercial customers starting November 1 with a more powerful version of M365 Chat and new capabilities for Copilot in Outlook, Excel, Loop, OneNote, OneDrive, and Word. Read the full details of the Microsoft Copilot announcements here.

Under The Hood: How Microsoft Copilot Tames LLM Issues

Analyst Take: Large language models (LLMs) have several built-in challenges, primarily accuracy, bias, and hallucination. These challenges can pose significant risks for companies leveraging LLMs. Microsoft’s investment in and partnership with OpenAI and the ubiquitous ChatGPT had me wondering how Microsoft is leveraging OpenAI and Microsoft’s own LLM intellectual property (IP) with Copilot. After the New York announcement, I had some questions for Microsoft, which the company kindly answered.

The question we will focus on today was: How will Microsoft deal with/solve the built-in challenges LLMs have, primarily accuracy, bias, and hallucination, in the deployment of Copilot across Windows, 365, Bing, etc.?

Microsoft provided a look under the hood as to how it is addressing these issues, specifically for Microsoft 365 Copilot. Interestingly, Microsoft has published an article on Microsoft Learn, Microsoft’s documentation, training, and certification portal, that addresses many of the LLM concerns: Data, Privacy, and Security of Microsoft 365 Copilot.

Accuracy

LLM access to data that is not part of its training data is called grounding. Copilot combines LLMs with content in the Microsoft Graph (emails, chats, documents you have permission to access) and Microsoft 365 apps. Importantly, Microsoft Graph gives Copilot access to not only the content but also the context of the content – such as email exchanges the user had on a topic. Copilot generates responses anchored from your organizational data and nothing else. In essence, the LLM is compartmentalized to certain tasks for Copilot but not to others.

Looking further, Copilot uses only organizational data “to which individual users have at least view permissions.” It only searches for information from the user’s tenant.

Accuracy Caveat

Microsoft acknowledges there will be issues with accuracy. The company’s primary suggestion for dealing with that – do not depend on Copilot for fully automating draft writings and summaries:

“The responses that generative AI produces aren’t guaranteed to be 100% factual. While we continue to improve responses, users should still use their judgment when reviewing the output before sending them to others. Our Microsoft 365 Copilot capabilities provide useful drafts and summaries to help you achieve more while giving you a chance to review the generated AI rather than fully automating these tasks.”

Regarding the accuracy issues of misinformation and disinformation, defeating that issue within Copilot is a work in progress:

“We continue to improve algorithms to proactively address issues, such as misinformation and disinformation, content blocking, data safety, and preventing the promotion of harmful or discriminatory content in line with our responsible AI principles.”

Privacy and Security

Microsoft has a very well-thought-through plan for protecting user data from LLMs in Microsoft 365 Copilot enterprise customers.

Copilot is General Data Protection Regulation (GDPR) and European Union (EU) Data Boundary compliant. No user data or activity accessed through Microsoft Graph is used to train LLMs. As previously mentioned, within user data, Copilot only surfaces organizational data to which individual users have at least view permissions, though it should be noted that those users must make sure they are using the permission models available in 365 apps. Copilot only searches for information from the user’s tenant. It cannot search other tenants the user might have access to. User prompts, the data Copilot retrieves, and the responses generated remain within the Microsoft 365 boundary. Microsoft makes a point that Copilot uses “Azure OpenAI services for processing, not OpenAI’s publicly available services.”

For the grounding process, something called the Semantic Index ensures the grounding is based only on the content that the current user is authorized to access.

These are just highlights of the protections outlined in the Learn article. There are further details about encryption, sensitivity labels, restricted permissions, isolation controls, and details of compliance to the EU Data Boundary.

Bias

According to Microsoft, Copilot leverages a safety system including content filtering, operational tracking, and abuse detection to provide a safe search experience.

Hallucination

Hallucination will continue to be the Achilles’ heel of LLMs, and Copilot will hallucinate. Microsoft’s approach includes two initiatives — prompt design and user rating/feedback.

Regarding prompt design, the idea that all of us going forward will have to adapt new interaction techniques to take advantage of LLM-based systems was promoted at the Copilot launch event. Microsoft said it will be offering trainings and how-tos for how users should write prompts.

For user ratings, Microsoft might employ user feedback to improve the model. For example, users can rate each response to indicate if the response is helpful or not and provide additional detailed feedback for their ratings.

Conclusions

Microsoft 365 Copilot is going to immerse an enormous number of users in LLM-based AI. The company seems to be confident that the inherent challenges of current LLMs will not materially affect outcomes for Copilot.

Of the main challenges – accuracy, privacy, bias, and hallucination – Microsoft has a solid standing for privacy and security. For bias, it is hard to say whether the stated controls will be effective; only time will tell.

In terms of accuracy, Microsoft clearly sees grounding as the main control. This approach should reduce inaccuracy significantly, but it is interesting to see the company state that many Copilot outputs should be seen as draft pieces and should not be treated as final output without user inputs. Here is where the path forward gets tricky – will users heed this advice? Will they learn quickly enough to avoid major issues? Will they become disenchanted and abandon? Worse, will they simply just go with automated output and not care? Microsoft also does not have a strong deterrent yet for dealing with misinformation and disinformation.

Hallucination will also be an issue because at this point, the onus is on the users to design good prompts and to provide response feedback. The same issue as for accuracy applies here – will users participate, will they actively become educated, and will enough provide response feedback?

In essence, a lot hinges on users taking an active role. Clearly, Microsoft knows this. The company is willing to bet that most users will adapt new behaviors, much like how we all adapted to web search, mouse GUI, and texting to take advantage of these new capabilities. Microsoft seems to acknowledge there will be bumps in the road that can be fixed. A great analogy for these times – flying a plane while building it.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Microsoft Copilot Will Be the AI Inflection Point

Microsoft Earning July 2023

Microsoft’s Zero-Upcharge Copilot Strategy May Elevate GenAI Adoption

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?
July 4, 2026

Is AI Ready for Real Work, or Are Enterprises Still Stuck in Experimentation?

Most enterprises claim advanced AI maturity, but lack governance and deployment strategies. Leading organizations are moving from experimentation to measurable AI impact....
Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up
July 4, 2026

Compliance as Code Is No Longer Optional: Why Manual Reviews Can’t Keep Up

Qodo's 'Compliance as Code' framework automates enterprise AI compliance through PR checks, solving the data privacy and security gaps that plague manual reviews at scale....
Databricks AI’s GPU Reliability Push Exposes Hidden Risks for Large-Scale Training
July 3, 2026

Databricks AI’s GPU Reliability Push Exposes Hidden Risks for Large-Scale Training

Databricks AI reveals critical GPU reliability challenges in distributed training environments. Silent slowdowns and numerical corruption pose greater risks than visible failures, threatening model quality and compute efficiency at enterprise...
AI Code Review Hits a Wall: Why Speed Without Trust Risks Engineering Chaos
July 3, 2026

AI Code Review Hits a Wall: Why Speed Without Trust Risks Engineering Chaos

A survey shows 94% of engineering leaders use agentic AI coding tools, but 55% struggle with reliability and hallucinations—revealing a critical gap between development speed and production quality....
Brave's Browser Containers Raise the Bar for Privacy and Workflow Flexibility
July 3, 2026

Brave’s Browser Containers Raise the Bar for Privacy and Workflow Flexibility

As AI platform adoption accelerates to $181.3B projected market size, Brave's v1.92 release introduces native browser containers addressing data privacy concerns for 52.6% of enterprise decision makers managing multi-cloud AI...
Is Self-Healing ITOps Ready to Replace Manual Incident Response?
July 3, 2026

Is Self-Healing ITOps Ready to Replace Manual Incident Response?

LogicMonitor's AI-driven ITOps framework combines root-cause analysis with governed automation to reduce alert fatigue and accelerate issue resolution, as agentic AI reshapes enterprise infrastructure management....

Book a Demo

Welcome

The vision behind everything in Futurum’s Custom Research practice is this: research should show you what is happening, what comes next, and what to do about it. It should be personal to each audience, easy for people to grasp, and structured so LLMs can reason over it accurately. And it should be fast and turnkey; you want answers now, not another project to carry for quarters.

Whether you are defining business, channel, or go-to-market strategy; evaluating vendors or justifying ROI; or commissioning research to fill an emerging market need, we have your back, with a program that answers your questions with the objectivity and credibility to drive real decisions.

To do it, we bring unmatched data to bear: Futurum research, surveys, and market projections; validated market feeds; ETR’s 15 years of insight from 10,000 technology decision-makers; G2’s buyer and user data; and what our analysts hear every day. Add leading primary collection, from AI-moderated voice interviews to surveys and analyst-led interviews, all turnkey, and every project comes out credible, nuanced, and actionable.

And we don’t just drop the results in your lap. For internal work, we provide analyst-led sessions, interactive dashboards, and a range of formats. For market-facing work, Futurum delivers turnkey activation and amplification that actually gets seen, by people and by LLMs, through our media and share of voice. This is research that moves decisions and markets.

We will meet you wherever you are, from a fast-turn brief to a multi-year program, and shape the work to your goals, timeline, and budget. The right program for your moment.

If any of this is useful, I would love to talk.

Benjamin Brown, VP Custom Research, Futurum Research

Benjamin Brown

VP, Custom Research · The Futurum Group

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.