Microsoft Releases Powerful Phi-3.5 Models, Beating Google, Openai, and Others

Microsoft isn’t simply relying on its success with OpenAI; the company is taking bold steps forward by unveiling three new models in its evolving Phi series of language and multimodal AI.

The latest models, dubbed Phi-3.5, include the 3.82 billion parameter Phi-3.5-mini-instruct, the 41.9 billion parameter Phi-3.5-MoE-instruct, and the 4.15 billion parameter Phi-3.5-vision-instruct. These models are tailored for fast reasoning, advanced reasoning, and vision tasks like image and video analysis, respectively.

Developers can now download, use, and customize these models on Hugging Face, under a Microsoft-branded MIT License, which allows for unrestricted commercial use and modification.

Remarkably, these models deliver near state-of-the-art performance on various third-party benchmark tests, outperforming some offerings from competitors like Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and even OpenAI’s GPT-4o in certain cases. This achievement, coupled with the open license, has garnered Microsoft praise on social media platform X.

How the hell Phi-3.5 is even possible?

Phi-3.5-3.8B (Mini) somehow beats LLaMA-3.1-8B..
(trained only on 3.4T tokens)

Phi-3.5-16×3.8B (MoE) somehow beats Gemini-Flash
(trained only on 4.9T tokens)

Phi-3.5-V-4.2B (Vision) somehow beats GPT-4o
(trained on 500B tokens)

how? lol pic.twitter.com/97gmx1CsQs
— Yam Peleg (@Yampeleg) August 20, 2024

Congrats to @Microsoft for achieving such an incredible result with the just released phi 3.5: mini+MoE+vision

Phi-3.5-MoE beats Llama 3.1 8B across the benchmarks

Of course, Phi-3.5-MoE a 42B parameter MoE with 6.6B activated during generation

And Phi-3.5 MoE outperforms… pic.twitter.com/9d4h5Q5p7Z
— Rohan Paul (@rohanpaul_ai) August 20, 2024

Let’s gooo.. Microsoft just release Phi 3.5 mini, MoE and vision with 128K context, multilingual & MIT license! MoE beats Gemini flash, Vision competitive with GPT4o

> Mini with 3.8B parameters, beats Llama3.1 8B and Mistral 7B and competitive with Mistral NeMo 12B
>… pic.twitter.com/7QJYOSSdyX
— Vaibhav (VB) Srivastav (@reach_vb) August 20, 2024

Here’s a brief overview of the new models based on their release notes on Hugging Face:

Phi-3.5 Mini Instruct: Optimized for Resource-Constrained Environments
The Phi-3.5 Mini Instruct model is a compact AI model with 3.8 billion parameters, designed for instruction adherence and supporting a 128k token context length. This model is ideal for tasks requiring strong reasoning abilities in environments with limited memory or computing power, such as code generation, mathematical problem-solving, and logic-based reasoning. Despite its small size, it performs competitively in multilingual and multi-turn conversational tasks and surpasses similarly-sized models like Llama-3.1-8B-instruct and Mistral-7B-instruct on benchmarks like RepoQA, which measures long context code understanding.

MoE models unique architecture 2 — Microsoft Releases Powerful Phi-3.5 Models, Beating Google, OpenAI, and Others 4

Phi-3.5 MoE: Microsoft’s ‘Mixture of Experts’
The Phi-3.5 MoE (Mixture of Experts) model is Microsoft’s first in this category, combining multiple specialized models into one. It features an architecture with 42 billion active parameters and supports a 128k token context length, offering scalable AI performance for demanding applications. However, according to Hugging Face documentation, it operates with only 6.6 billion active parameters.

MoE models unique architecture 1 — Microsoft Releases Powerful Phi-3.5 Models, Beating Google, OpenAI, and Others 5

Designed for a range of reasoning tasks, Phi-3.5 MoE excels in code, math, and multilingual language understanding, often outperforming larger models in specific benchmarks like RepoQA. It also outshines GPT-4o mini on the 5-shot MMLU (Massive Multitask Language Understanding) across subjects including STEM, humanities, and social sciences.

MoE models unique architecture — Microsoft Releases Powerful Phi-3.5 Models, Beating Google, OpenAI, and Others 6

Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning
Rounding out the trio is the Phi-3.5 Vision Instruct model, which integrates text and image processing capabilities. This multimodal model is particularly well-suited for tasks such as general image understanding, optical character recognition, chart and table comprehension, and video summarization. Like the other models in the Phi-3.5 series, Vision Instruct supports a 128k token context length, enabling it to handle complex, multi-frame visual tasks. Microsoft emphasizes that this model was trained using a combination of synthetic and filtered publicly available datasets, focusing on high-quality, reasoning-rich data.

Training the New Phi Models
The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs over 10 days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs over 6 days. The Phi-3.5 MoE model, featuring a mixture-of-experts architecture, was trained on 4.9 trillion tokens with 512 H100-80G GPUs over 23 days.

Open-Source Under MIT License
All three Phi-3.5 models are available under the MIT license, reflecting Microsoft’s commitment to supporting the open-source community. This license permits developers to freely use, modify, merge, publish, distribute, sublicense, or sell copies of the software. The license also includes a disclaimer that the software is provided “as is,” without warranties, and Microsoft and other copyright holders are not liable for any claims, damages, or other liabilities that may arise from the software’s use.

Microsoft’s launch of the Phi-3.5 series represents a major advancement in multilingual and multimodal AI. By making these models available under an open-source license, Microsoft is empowering developers to integrate cutting-edge AI capabilities into their applications, driving innovation in both commercial and research settings.

Microsoft Releases Powerful Phi-3.5 Models, Beating Google, OpenAI, and Others

What Made Nvidia the World’s Most Valuable Company

Nvidia Introduces Device Aimed at Small Businesses and Hobbyists

For the First Time in Its History, Tesla’s Sales Declined Year Over Year

To Combat Scams, Telegram Adds Third Party Verification