Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

The News: On November 20, Microsoft Research announced the debut of Orca 2, the company’s latest step in exploring the capabilities of smaller language models (LMs), which they define as 10 billion parameters or less. Here are the key details:

  • Orca 2 significantly surpasses models of similar size (including the original Orca 13B model) attaining performance levels “similar to or better than models 5-10 times larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings.”
  • In benchmark testing for language understanding, common sense reasoning, multistep reasoning, math problem-solving, reading comprehension, summarizing, groundedness, truthfulness, and toxic content generation and identification, Orca 2 outperformed large LMs (LLMs) Llama 2 Chat 13B, Llama 2 Chat 70B, WizardLM 13B, and WizardLM 70B.
  • The model comes in two sizes, 7 billion and 13 billion parameters.
  • The models were created by fine-tuning Llama 2 base models on tailored synthetic data.
  • Microsoft researchers say the key to Orca 2’s performance is using different solution strategies for learning from LLMs. For example, an LLM such as GPT-4 can answer complex tasks directly while a smaller model might benefit from breaking the task into steps.
    • The training data was generated such that it teaches Orca 2 various reasoning techniques, such as step-by-step processing, recall then generate, recall-reason-generate, extract-generate, and direct answer methods, while teaching it to choose different solution strategies for different tasks.

Read more in the blog post “Orca 2: Teaching Small Language Models How to Reason” on the Microsoft website.

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Analyst Take: Microsoft’s Orca 2 is perhaps one of the most significant breakthroughs in generative AI since the release of ChatGPT. Here is why:

AI Compute Workloads Will Get Smaller

The massive compute workloads required particularly for training LLMs has worried the AI ecosystem since ChatGPT launched. Could enterprises afford the compute required to train and run (inference) AI models? Would the economics work against commercial AI solutions? High-performing smaller models will cost less to support than larger LLMs and ensure better paths to commercialized AI.

AI Chip Market Dynamics Change

Market bottlenecks are occurring because the massive AI compute workloads require the most powerful graphics processing units (GPUs) to handle AI training. NVIDIA cannot make enough of its coveted GPUs to meet market demand, while other chip players have scrambled to create purpose-built chips for AI. As the AI compute workloads come down with the emergence of smaller LMs, other chips from a slew of other chip makers become viable options to handle AI compute.

Smaller LMs Are Now on Par, Better Performers than LLMs

It was believed LLMs would outperform smaller LMs. There are now a long string of smaller models performing as well as or better than the largest LMs. It started with Meta’s Llama 2 13B, but now Orca 2’s 13B and 7B outperform Llama 2 13B as well as Mistral AI’s 7B. These improvements come from changing how models learn, as evidenced by Orca’s approach. Better performance for less cost? That would seem to be a way for most enterprises to go.

Unleashing On-Device AI

As smaller LMs get both smaller and better, potent use cases for on-device AI for smartphones and PCs become more likely. Look for on-device AI champions such as Qualcomm, Dell Technologies, and others to parlay smaller LMs into on-device AI.

Conclusion

Orca 2 is a significant breakthrough for the commercialization of AI. Microsoft was savvy to invest in the improvement of LMs, particularly as the company thinks about PCs and the emergence of on-device AI. It is also further proof that Microsoft is not reliant on OpenAI’s GPT models. Look for further rapid refinement of smaller LMs and the rapid adoption of them for enterprise use.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Microsoft Ignite Showcases AI Advancements with Copilot in Teams

Microsoft’s AI Safety Policies: Best Practice

Under The Hood: How Microsoft Copilot Tames LLM Issues

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?
April 22, 2026

Qualcomm’s Snapdragon Wear Elite Redefines the AI Wearable Stakes—But Who Wins the Wrist War?

Qualcomm's Snapdragon Wear Elite marks a turning point in wearable AI, delivering a dedicated neural processing unit for on-device intelligence, privacy, and real-time voice interactions—positioning the company against Apple and...
VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?
April 22, 2026

VAST Data Valuation Triples. Can a Unified Platform Scale AI Globally?

Brad Shimmin, Vice President & Practice Lead at Futurum, analyzes VAST Data valuation and its AI operating system strategy, questioning whether unified infrastructure can scale amid persistent market fragmentation....
Cerebras S-1 Teardown: Is the $23B Wafer-Scale IPO the End of GPU Homogeneity?
April 22, 2026

Cerebras S-1 Teardown: Is the $23B Wafer-Scale IPO the End of GPU Homogeneity?

Brendan Burke, Research Director at Futurum, examines Cerebras Systems' S-1 filing and $23B valuation, dissecting the $20B OpenAI deal, 86% UAE revenue concentration, and whether wafer-scale silicon can survive the...
Free Notification Sound Effects: Are Royalty-Free SFX the Next Enterprise UX Edge?
April 22, 2026

Free Notification Sound Effects: Are Royalty-Free SFX the Next Enterprise UX Edge?

ElevenLabs' new free royalty-free SFX offering removes licensing barriers for enterprise audio branding. As digital products compete for user attention, professional-grade notification sounds become a strategic UX differentiator....
Free Notification SFX: Does High-Quality Audio Democratize Digital Experience?
April 22, 2026

Free Notification SFX: Does High-Quality Audio Democratize Digital Experience?

ElevenLabs democratizes audio creation with free, high-quality notification sound effects for developers and creators. This strategic move lowers barriers to professional sound design while reshaping the competitive landscape for SFX...
Brand Visibility Solution
April 21, 2026

Will Adobe’s Brand Visibility Solution Rewrite the Rules of AI-Driven Customer Experience?

Adobe expands Experience Manager with a brand visibility solution for AI-driven customer engagement, positioning itself against Salesforce, Oracle, and SAP as generative AI becomes enterprises' primary discovery channel....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.