The News: In September, Microsoft unveiled a series of announcements that will bring AI into some of the most used software applications on the planet, Windows OS and Microsoft 365, as well as Bing and Edge.

The big announcement was Microsoft’s introduction of Copilot. According to press materials, Copilot “will be your everyday AI companion. Copilot will uniquely incorporate the context and intelligence of the web, your work data and what you are doing in the moment on your PC to provide better assistance – with your privacy and security at the forefront. It will be a simple and seamless experience, available in Windows 11, Microsoft 365, and in our web browser with Edge and Bing. It will work as an app or reveal itself when you need it with a right click. We will continue to add capabilities and connections to Copilot across to our most-used applications over time in service of our vision to have one experience that works across your whole life.”

Microsoft 365 Copilot will generally be available to commercial customers starting November 1 with a more powerful version of M365 Chat and new capabilities for Copilot in Outlook, Excel, Loop, OneNote, OneDrive, and Word. Read the full details of the Microsoft Copilot announcements here.

Under The Hood: How Microsoft Copilot Tames LLM Issues

Analyst Take: Large language models (LLMs) have several built-in challenges, primarily accuracy, bias, and hallucination. These challenges can pose significant risks for companies leveraging LLMs. Microsoft’s investment in and partnership with OpenAI and the ubiquitous ChatGPT had me wondering how Microsoft is leveraging OpenAI and Microsoft’s own LLM intellectual property (IP) with Copilot. After the New York announcement, I had some questions for Microsoft, which the company kindly answered.

The question we will focus on today was: How will Microsoft deal with/solve the built-in challenges LLMs have, primarily accuracy, bias, and hallucination, in the deployment of Copilot across Windows, 365, Bing, etc.?

Microsoft provided a look under the hood as to how it is addressing these issues, specifically for Microsoft 365 Copilot. Interestingly, Microsoft has published an article on Microsoft Learn, Microsoft’s documentation, training, and certification portal, that addresses many of the LLM concerns: Data, Privacy, and Security of Microsoft 365 Copilot.

Accuracy

LLM access to data that is not part of its training data is called grounding. Copilot combines LLMs with content in the Microsoft Graph (emails, chats, documents you have permission to access) and Microsoft 365 apps. Importantly, Microsoft Graph gives Copilot access to not only the content but also the context of the content – such as email exchanges the user had on a topic. Copilot generates responses anchored from your organizational data and nothing else. In essence, the LLM is compartmentalized to certain tasks for Copilot but not to others.

Looking further, Copilot uses only organizational data “to which individual users have at least view permissions.” It only searches for information from the user’s tenant.

Accuracy Caveat

Microsoft acknowledges there will be issues with accuracy. The company’s primary suggestion for dealing with that – do not depend on Copilot for fully automating draft writings and summaries:

“The responses that generative AI produces aren’t guaranteed to be 100% factual. While we continue to improve responses, users should still use their judgment when reviewing the output before sending them to others. Our Microsoft 365 Copilot capabilities provide useful drafts and summaries to help you achieve more while giving you a chance to review the generated AI rather than fully automating these tasks.”

Regarding the accuracy issues of misinformation and disinformation, defeating that issue within Copilot is a work in progress:

“We continue to improve algorithms to proactively address issues, such as misinformation and disinformation, content blocking, data safety, and preventing the promotion of harmful or discriminatory content in line with our responsible AI principles.”

Privacy and Security

Microsoft has a very well-thought-through plan for protecting user data from LLMs in Microsoft 365 Copilot enterprise customers.

Copilot is General Data Protection Regulation (GDPR) and European Union (EU) Data Boundary compliant. No user data or activity accessed through Microsoft Graph is used to train LLMs. As previously mentioned, within user data, Copilot only surfaces organizational data to which individual users have at least view permissions, though it should be noted that those users must make sure they are using the permission models available in 365 apps. Copilot only searches for information from the user’s tenant. It cannot search other tenants the user might have access to. User prompts, the data Copilot retrieves, and the responses generated remain within the Microsoft 365 boundary. Microsoft makes a point that Copilot uses “Azure OpenAI services for processing, not OpenAI’s publicly available services.”

For the grounding process, something called the Semantic Index ensures the grounding is based only on the content that the current user is authorized to access.

These are just highlights of the protections outlined in the Learn article. There are further details about encryption, sensitivity labels, restricted permissions, isolation controls, and details of compliance to the EU Data Boundary.

Bias

According to Microsoft, Copilot leverages a safety system including content filtering, operational tracking, and abuse detection to provide a safe search experience.

Hallucination

Hallucination will continue to be the Achilles’ heel of LLMs, and Copilot will hallucinate. Microsoft’s approach includes two initiatives — prompt design and user rating/feedback.

Regarding prompt design, the idea that all of us going forward will have to adapt new interaction techniques to take advantage of LLM-based systems was promoted at the Copilot launch event. Microsoft said it will be offering trainings and how-tos for how users should write prompts.

For user ratings, Microsoft might employ user feedback to improve the model. For example, users can rate each response to indicate if the response is helpful or not and provide additional detailed feedback for their ratings.

Conclusions

Microsoft 365 Copilot is going to immerse an enormous number of users in LLM-based AI. The company seems to be confident that the inherent challenges of current LLMs will not materially affect outcomes for Copilot.

Of the main challenges – accuracy, privacy, bias, and hallucination – Microsoft has a solid standing for privacy and security. For bias, it is hard to say whether the stated controls will be effective; only time will tell.

In terms of accuracy, Microsoft clearly sees grounding as the main control. This approach should reduce inaccuracy significantly, but it is interesting to see the company state that many Copilot outputs should be seen as draft pieces and should not be treated as final output without user inputs. Here is where the path forward gets tricky – will users heed this advice? Will they learn quickly enough to avoid major issues? Will they become disenchanted and abandon? Worse, will they simply just go with automated output and not care? Microsoft also does not have a strong deterrent yet for dealing with misinformation and disinformation.

Hallucination will also be an issue because at this point, the onus is on the users to design good prompts and to provide response feedback. The same issue as for accuracy applies here – will users participate, will they actively become educated, and will enough provide response feedback?

In essence, a lot hinges on users taking an active role. Clearly, Microsoft knows this. The company is willing to bet that most users will adapt new behaviors, much like how we all adapted to web search, mouse GUI, and texting to take advantage of these new capabilities. Microsoft seems to acknowledge there will be bumps in the road that can be fixed. A great analogy for these times – flying a plane while building it.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Microsoft Copilot Will Be the AI Inflection Point

Microsoft Earning July 2023

Microsoft’s Zero-Upcharge Copilot Strategy May Elevate GenAI Adoption

Author Information

Mark Beccue

Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.

Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Under The Hood: How Microsoft Copilot Tames LLM Issues

Under The Hood: How Microsoft Copilot Tames LLM Issues

Accuracy

Accuracy Caveat

Privacy and Security

Bias

Hallucination