Menu

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

The News: On November 20, Microsoft Research announced the debut of Orca 2, the company’s latest step in exploring the capabilities of smaller language models (LMs), which they define as 10 billion parameters or less. Here are the key details:

  • Orca 2 significantly surpasses models of similar size (including the original Orca 13B model) attaining performance levels “similar to or better than models 5-10 times larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings.”
  • In benchmark testing for language understanding, common sense reasoning, multistep reasoning, math problem-solving, reading comprehension, summarizing, groundedness, truthfulness, and toxic content generation and identification, Orca 2 outperformed large LMs (LLMs) Llama 2 Chat 13B, Llama 2 Chat 70B, WizardLM 13B, and WizardLM 70B.
  • The model comes in two sizes, 7 billion and 13 billion parameters.
  • The models were created by fine-tuning Llama 2 base models on tailored synthetic data.
  • Microsoft researchers say the key to Orca 2’s performance is using different solution strategies for learning from LLMs. For example, an LLM such as GPT-4 can answer complex tasks directly while a smaller model might benefit from breaking the task into steps.
    • The training data was generated such that it teaches Orca 2 various reasoning techniques, such as step-by-step processing, recall then generate, recall-reason-generate, extract-generate, and direct answer methods, while teaching it to choose different solution strategies for different tasks.

Read more in the blog post “Orca 2: Teaching Small Language Models How to Reason” on the Microsoft website.

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Analyst Take: Microsoft’s Orca 2 is perhaps one of the most significant breakthroughs in generative AI since the release of ChatGPT. Here is why:

AI Compute Workloads Will Get Smaller

The massive compute workloads required particularly for training LLMs has worried the AI ecosystem since ChatGPT launched. Could enterprises afford the compute required to train and run (inference) AI models? Would the economics work against commercial AI solutions? High-performing smaller models will cost less to support than larger LLMs and ensure better paths to commercialized AI.

AI Chip Market Dynamics Change

Market bottlenecks are occurring because the massive AI compute workloads require the most powerful graphics processing units (GPUs) to handle AI training. NVIDIA cannot make enough of its coveted GPUs to meet market demand, while other chip players have scrambled to create purpose-built chips for AI. As the AI compute workloads come down with the emergence of smaller LMs, other chips from a slew of other chip makers become viable options to handle AI compute.

Smaller LMs Are Now on Par, Better Performers than LLMs

It was believed LLMs would outperform smaller LMs. There are now a long string of smaller models performing as well as or better than the largest LMs. It started with Meta’s Llama 2 13B, but now Orca 2’s 13B and 7B outperform Llama 2 13B as well as Mistral AI’s 7B. These improvements come from changing how models learn, as evidenced by Orca’s approach. Better performance for less cost? That would seem to be a way for most enterprises to go.

Unleashing On-Device AI

As smaller LMs get both smaller and better, potent use cases for on-device AI for smartphones and PCs become more likely. Look for on-device AI champions such as Qualcomm, Dell Technologies, and others to parlay smaller LMs into on-device AI.

Conclusion

Orca 2 is a significant breakthrough for the commercialization of AI. Microsoft was savvy to invest in the improvement of LMs, particularly as the company thinks about PCs and the emergence of on-device AI. It is also further proof that Microsoft is not reliant on OpenAI’s GPT models. Look for further rapid refinement of smaller LMs and the rapid adoption of them for enterprise use.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Microsoft Ignite Showcases AI Advancements with Copilot in Teams

Microsoft’s AI Safety Policies: Best Practice

Under The Hood: How Microsoft Copilot Tames LLM Issues

Author Information

Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Related Insights
Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference
January 27, 2026

Amazon EC2 G7e Goes GA With Blackwell GPUs. What Changes for AI Inference?

Nick Patience, VP and AI Practice Lead at Futurum, examines Amazon’s EC2 G7e instances and how higher GPU memory, bandwidth, and networking change AI inference and graphics workloads....
NVIDIA and CoreWeave Team to Break Through Data Center Real Estate Bottlenecks
January 27, 2026

NVIDIA and CoreWeave Team to Break Through Data Center Real Estate Bottlenecks

Nick Patience, AI Platforms Practice Lead at Futurum, shares his insights on NVIDIA’s $2 billion investment in CoreWeave to accelerate the buildout of over 5 gigawatts of specialized AI factories...
Will Microsoft’s “Frontier Firms” Serve as Models for AI Utilization
January 26, 2026

Will Microsoft’s “Frontier Firms” Serve as Models for AI Utilization?

Keith Kirkpatrick, VP and Research Director at Futurum, covers the New York Microsoft AI Tour stop and discusses how the company is shifting the conversation around AI from features to...
Snowflake Acquires Observe Operationalizing the Data Cloud
January 26, 2026

Snowflake Acquires Observe: Operationalizing the Data Cloud

Brad Shimmin, VP & Practice Lead at Futurum, examines Snowflake’s intent to acquire Observe and integrate AI-powered observability into the AI Data Cloud....
ServiceNow Bets on OpenAI to Power Agentic Enterprise Workflows
January 23, 2026

ServiceNow Bets on OpenAI to Power Agentic Enterprise Workflows

Keith Kirkpatrick, Research Director at Futurum, examines ServiceNow’s multi-year collaboration with OpenAI, highlighting a shift toward agentic AI embedded in core enterprise workflows....
January 21, 2026

AI-Enabled Enterprise Workspace – Futurum Signal

The enterprise workspace is entering a new phase—one shaped less by device refresh cycles and more by intelligent integration. As AI-enabled PCs enter the mainstream, the real challenge for IT...

Book a Demo

Newsletter Sign-up Form

Get important insights straight to your inbox, receive first looks at eBooks, exclusive event invitations, custom content, and more. We promise not to spam you or sell your name to anyone. You can always unsubscribe at any time.

All fields are required






Thank you, we received your request, a member of our team will be in contact with you.