Microsoft isn’t simply relying on its success with OpenAI; the company is taking bold steps forward by unveiling three new models in its evolving Phi series of language and multimodal AI.
The latest models, dubbed Phi-3.5, include the 3.82 billion parameter Phi-3.5-mini-instruct, the 41.9 billion parameter Phi-3.5-MoE-instruct, and the 4.15 billion parameter Phi-3.5-vision-instruct. These models are tailored for fast reasoning, advanced reasoning, and vision tasks like image and video analysis, respectively.
Developers can now download, use, and customize these models on Hugging Face, under a Microsoft-branded MIT License, which allows for unrestricted commercial use and modification.
Remarkably, these models deliver near state-of-the-art performance on various third-party benchmark tests, outperforming some offerings from competitors like Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and even OpenAI’s GPT-4o in certain cases. This achievement, coupled with the open license, has garnered Microsoft praise on social media platform X.
Here’s a brief overview of the new models based on their release notes on Hugging Face:
Phi-3.5 Mini Instruct: Optimized for Resource-Constrained Environments
The Phi-3.5 Mini Instruct model is a compact AI model with 3.8 billion parameters, designed for instruction adherence and supporting a 128k token context length. This model is ideal for tasks requiring strong reasoning abilities in environments with limited memory or computing power, such as code generation, mathematical problem-solving, and logic-based reasoning. Despite its small size, it performs competitively in multilingual and multi-turn conversational tasks and surpasses similarly-sized models like Llama-3.1-8B-instruct and Mistral-7B-instruct on benchmarks like RepoQA, which measures long context code understanding.
Phi-3.5 MoE: Microsoft’s ‘Mixture of Experts’
The Phi-3.5 MoE (Mixture of Experts) model is Microsoft’s first in this category, combining multiple specialized models into one. It features an architecture with 42 billion active parameters and supports a 128k token context length, offering scalable AI performance for demanding applications. However, according to Hugging Face documentation, it operates with only 6.6 billion active parameters.
Designed for a range of reasoning tasks, Phi-3.5 MoE excels in code, math, and multilingual language understanding, often outperforming larger models in specific benchmarks like RepoQA. It also outshines GPT-4o mini on the 5-shot MMLU (Massive Multitask Language Understanding) across subjects including STEM, humanities, and social sciences.
Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning
Rounding out the trio is the Phi-3.5 Vision Instruct model, which integrates text and image processing capabilities. This multimodal model is particularly well-suited for tasks such as general image understanding, optical character recognition, chart and table comprehension, and video summarization. Like the other models in the Phi-3.5 series, Vision Instruct supports a 128k token context length, enabling it to handle complex, multi-frame visual tasks. Microsoft emphasizes that this model was trained using a combination of synthetic and filtered publicly available datasets, focusing on high-quality, reasoning-rich data.
Training the New Phi Models
The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs over 10 days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs over 6 days. The Phi-3.5 MoE model, featuring a mixture-of-experts architecture, was trained on 4.9 trillion tokens with 512 H100-80G GPUs over 23 days.
Open-Source Under MIT License
All three Phi-3.5 models are available under the MIT license, reflecting Microsoft’s commitment to supporting the open-source community. This license permits developers to freely use, modify, merge, publish, distribute, sublicense, or sell copies of the software. The license also includes a disclaimer that the software is provided “as is,” without warranties, and Microsoft and other copyright holders are not liable for any claims, damages, or other liabilities that may arise from the software’s use.
Microsoft’s launch of the Phi-3.5 series represents a major advancement in multilingual and multimodal AI. By making these models available under an open-source license, Microsoft is empowering developers to integrate cutting-edge AI capabilities into their applications, driving innovation in both commercial and research settings.