AI2's New AI Model Outperforms DeepSeek

A New AI Leader Emerges
The landscape of artificial intelligence is shifting. A new model has arrived, challenging the dominance of existing systems like DeepSeek V3.
Ai2, a Seattle-based nonprofit AI research institute, unveiled its new model on Thursday. They assert that it surpasses the capabilities of DeepSeek V3, a prominent system developed by the Chinese AI company DeepSeek.
Performance and Open Source Availability
Ai2’s model, designated Tulu 3 405B, demonstrates superior performance to OpenAI’s GPT-4o on specific AI benchmarks, as indicated by Ai2’s internal evaluations.
Notably, Tulu 3 405B distinguishes itself from both GPT-4o and DeepSeek V3 through its open-source nature. This means the complete set of components required for its replication are freely accessible and permissively licensed.
U.S. Leadership in AI Development
A representative from Ai2 communicated to TechCrunch that the organization believes Tulu 3 405B highlights the United States’ potential to spearhead the global advancement of top-tier generative AI models.
The spokesperson emphasized that this achievement is a significant milestone for the future of open AI. It reinforces the U.S.’s standing as a leader in competitive, open-source models.
They further stated that the launch introduces a robust, U.S.-developed alternative to DeepSeek’s models. This represents a crucial moment in AI development, demonstrating the U.S.’s capacity to lead with competitive, open-source AI, independent of major technology corporations.
Model Size and Training
Tulu 3 405B is a substantial model, boasting 405 billion parameters.
According to Ai2, its training necessitated the parallel operation of 256 GPUs. The number of parameters generally correlates with a model’s problem-solving abilities, with larger models typically exhibiting better performance.
Reinforcement Learning with Verifiable Rewards
Ai2 attributes a key aspect of Tulu 3 405B’s competitive performance to a technique known as reinforcement learning with verifiable rewards (RLVR).
RLVR focuses on training models using tasks with “verifiable” outcomes, such as mathematical problem-solving and instruction following.
Benchmark Results
Ai2 reports that Tulu 3 405B outperformed DeepSeek V3, GPT-4o, and Meta’s Llama 3.1 405B model on the PopQA benchmark. PopQA consists of 14,000 specialized knowledge questions derived from Wikipedia.
Furthermore, Tulu 3 405B achieved the highest performance among models in its class on GSM8K, a test comprising grade school-level math word problems.
Availability and Access
Tulu 3 405B is currently available for testing through Ai2’s chatbot web application.
The code required to train the model can be found on GitHub and the AI development platform Hugging Face. Access is available now, but the AI landscape is constantly evolving.
Stay Informed
TechCrunch offers a newsletter dedicated to AI! Subscribe here to receive it in your inbox every Wednesday.
Related Posts

Disney Cease and Desist: Google Faces Copyright Infringement Claim

OpenAI Responds to Google with GPT-5.2 After 'Code Red' Memo

Waymo Baby Delivery: Birth in Self-Driving Car

Google AI Leadership: Promoting Data Center Tech Expert
