The News: On November 20, Microsoft Research announced the debut of Orca 2, the company’s latest step in exploring the capabilities of smaller language models (LMs), which they define as 10 billion parameters or less. Here are the key details:

Orca 2 significantly surpasses models of similar size (including the original Orca 13B model) attaining performance levels “similar to or better than models 5-10 times larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings.”
In benchmark testing for language understanding, common sense reasoning, multistep reasoning, math problem-solving, reading comprehension, summarizing, groundedness, truthfulness, and toxic content generation and identification, Orca 2 outperformed large LMs (LLMs) Llama 2 Chat 13B, Llama 2 Chat 70B, WizardLM 13B, and WizardLM 70B.
The model comes in two sizes, 7 billion and 13 billion parameters.
The models were created by fine-tuning Llama 2 base models on tailored synthetic data.
Microsoft researchers say the key to Orca 2’s performance is using different solution strategies for learning from LLMs. For example, an LLM such as GPT-4 can answer complex tasks directly while a smaller model might benefit from breaking the task into steps.
- The training data was generated such that it teaches Orca 2 various reasoning techniques, such as step-by-step processing, recall then generate, recall-reason-generate, extract-generate, and direct answer methods, while teaching it to choose different solution strategies for different tasks.

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Analyst Take: Microsoft’s Orca 2 is perhaps one of the most significant breakthroughs in generative AI since the release of ChatGPT. Here is why:

AI Compute Workloads Will Get Smaller

The massive compute workloads required particularly for training LLMs has worried the AI ecosystem since ChatGPT launched. Could enterprises afford the compute required to train and run (inference) AI models? Would the economics work against commercial AI solutions? High-performing smaller models will cost less to support than larger LLMs and ensure better paths to commercialized AI.

AI Chip Market Dynamics Change

Market bottlenecks are occurring because the massive AI compute workloads require the most powerful graphics processing units (GPUs) to handle AI training. NVIDIA cannot make enough of its coveted GPUs to meet market demand, while other chip players have scrambled to create purpose-built chips for AI. As the AI compute workloads come down with the emergence of smaller LMs, other chips from a slew of other chip makers become viable options to handle AI compute.

Smaller LMs Are Now on Par, Better Performers than LLMs

It was believed LLMs would outperform smaller LMs. There are now a long string of smaller models performing as well as or better than the largest LMs. It started with Meta’s Llama 2 13B, but now Orca 2’s 13B and 7B outperform Llama 2 13B as well as Mistral AI’s 7B. These improvements come from changing how models learn, as evidenced by Orca’s approach. Better performance for less cost? That would seem to be a way for most enterprises to go.

Unleashing On-Device AI

As smaller LMs get both smaller and better, potent use cases for on-device AI for smartphones and PCs become more likely. Look for on-device AI champions such as Qualcomm, Dell Technologies, and others to parlay smaller LMs into on-device AI.

Conclusion

Orca 2 is a significant breakthrough for the commercialization of AI. Microsoft was savvy to invest in the improvement of LMs, particularly as the company thinks about PCs and the emergence of on-device AI. It is also further proof that Microsoft is not reliant on OpenAI’s GPT models. Look for further rapid refinement of smaller LMs and the rapid adoption of them for enterprise use.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Microsoft Ignite Showcases AI Advancements with Copilot in Teams

Microsoft’s AI Safety Policies: Best Practice

Under The Hood: How Microsoft Copilot Tames LLM Issues

Author Information

Mark Beccue

Mark comes to The Futurum Group from Omdia’s Artificial Intelligence practice, where his focus was on natural language and AI use cases.

Previously, Mark worked as a consultant and analyst providing custom and syndicated qualitative market analysis with an emphasis on mobile technology and identifying trends and opportunities for companies like Syniverse and ABI Research. He has been cited by international media outlets including CNBC, The Wall Street Journal, Bloomberg Businessweek, and CNET. Based in Tampa, Florida, Mark is a veteran market research analyst with 25 years of experience interpreting technology business and holds a Bachelor of Science from the University of Florida.

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

Microsoft Orca 2: The Biggest Generative AI Breakthrough Since ChatGPT

AI Compute Workloads Will Get Smaller

AI Chip Market Dynamics Change

Smaller LMs Are Now on Par, Better Performers than LLMs

Unleashing On-Device AI