Menu

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

The News: On November 20, Microsoft Research announced the debut of Orca 2, the company’s latest step in exploring the capabilities of smaller language models (LMs), which they define as 10 billion parameters or less. Here are the key details:

  • Orca 2 significantly surpasses models of similar size (including the original Orca 13B model) attaining performance levels “similar to or better than models 5-10 times larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings.”
  • In benchmark testing for language understanding, common sense reasoning, multistep reasoning, math problem-solving, reading comprehension, summarizing, groundedness, truthfulness, and toxic content generation and identification, Orca 2 outperformed large LMs (LLMs) Llama 2 Chat 13B, Llama 2 Chat 70B, WizardLM 13B, and WizardLM 70B.
  • The model comes in two sizes, 7 billion and 13 billion parameters.
  • The models were created by fine-tuning Llama 2 base models on tailored synthetic data.
  • Microsoft researchers say the key to Orca 2’s performance is using different solution strategies for learning from LLMs. For example, an LLM such as GPT-4 can answer complex tasks directly while a smaller model might benefit from breaking the task into steps.
    • The training data was generated such that it teaches Orca 2 various reasoning techniques, such as step-by-step processing, recall then generate, recall-reason-generate, extract-generate, and direct answer methods, while teaching it to choose different solution strategies for different tasks.

Read more in the blog post “Orca 2: Teaching Small Language Models How to Reason” on the Microsoft website.

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Analyst Take: Microsoft’s Orca 2 is perhaps one of the most significant breakthroughs in generative AI since the release of ChatGPT. Here is why:

AI Compute Workloads Will Get Smaller

The massive compute workloads required particularly for training LLMs has worried the AI ecosystem since ChatGPT launched. Could enterprises afford the compute required to train and run (inference) AI models? Would the economics work against commercial AI solutions? High-performing smaller models will cost less to support than larger LLMs and ensure better paths to commercialized AI.

AI Chip Market Dynamics Change

Market bottlenecks are occurring because the massive AI compute workloads require the most powerful graphics processing units (GPUs) to handle AI training. NVIDIA cannot make enough of its coveted GPUs to meet market demand, while other chip players have scrambled to create purpose-built chips for AI. As the AI compute workloads come down with the emergence of smaller LMs, other chips from a slew of other chip makers become viable options to handle AI compute.

Smaller LMs Are Now on Par, Better Performers than LLMs

It was believed LLMs would outperform smaller LMs. There are now a long string of smaller models performing as well as or better than the largest LMs. It started with Meta’s Llama 2 13B, but now Orca 2’s 13B and 7B outperform Llama 2 13B as well as Mistral AI’s 7B. These improvements come from changing how models learn, as evidenced by Orca’s approach. Better performance for less cost? That would seem to be a way for most enterprises to go.

Unleashing On-Device AI

As smaller LMs get both smaller and better, potent use cases for on-device AI for smartphones and PCs become more likely. Look for on-device AI champions such as Qualcomm, Dell Technologies, and others to parlay smaller LMs into on-device AI.

Conclusion

Orca 2 is a significant breakthrough for the commercialization of AI. Microsoft was savvy to invest in the improvement of LMs, particularly as the company thinks about PCs and the emergence of on-device AI. It is also further proof that Microsoft is not reliant on OpenAI’s GPT models. Look for further rapid refinement of smaller LMs and the rapid adoption of them for enterprise use.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Microsoft Ignite Showcases AI Advancements with Copilot in Teams

Microsoft’s AI Safety Policies: Best Practice

Under The Hood: How Microsoft Copilot Tames LLM Issues

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
CIO Take Smartsheet's Intelligent Work Management as a Strategic Execution Platform
December 22, 2025

CIO Take: Smartsheet’s Intelligent Work Management as a Strategic Execution Platform

Dion Hinchcliffe analyzes Smartsheet’s Intelligent Work Management announcements from a CIO lens—what’s real about agentic AI for execution at scale, what’s risky, and what to validate before standardizing....
Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth
December 22, 2025

Will Zoho’s Embedded AI Enterprise Spend and Billing Solutions Drive Growth?

Keith Kirkpatrick, Research Director with Futurum, shares his insights on Zoho’s latest finance-focused releases, Zoho Spend and Zoho Billing Enterprise Edition, further underscoring Zoho’s drive to illustrate its enterprise-focused capabilities....
NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy
December 16, 2025

NVIDIA Bolsters AI/HPC Ecosystem with Nemotron 3 Models and SchedMD Buy

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA's release of its Nemotron 3 family of open-source models and the acquisition of SchedMD, the developer of...
Will a Digital Adoption Platform Become a Must-Have App in 2026?
December 15, 2025

Will a DAP Become the Must-Have Software App in 2026?

Keith Kirkpatrick, Research Director with Futurum, covers WalkMe’s 2025 Analyst Day, and discusses the company’s key pillars for driving success with enterprise software in an AI- and agentic-dominated world heading...
Broadcom Q4 FY 2025 Earnings AI And Software Drive Beat
December 15, 2025

Broadcom Q4 FY 2025 Earnings: AI And Software Drive Beat

Futurum Research analyzes Broadcom’s Q4 FY 2025 results, highlighting accelerating AI semiconductor momentum, Ethernet AI switching backlog, and VMware Cloud Foundation gains, alongside system-level deliveries....
Oracle Q2 FY 2026 Cloud Grows; Capex Rises for AI Buildout
December 12, 2025

Oracle Q2 FY 2026: Cloud Grows; Capex Rises for AI Buildout

Futurum Research analyzes Oracle’s Q2 FY 2026 earnings, highlighting cloud infrastructure momentum, record RPO, rising AI-focused capex, and multicloud database traction driving workload growth across OCI and partner clouds....

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.