LOGO

AI2's New AI Model Outperforms DeepSeek

January 30, 2025
AI2's New AI Model Outperforms DeepSeek

A New AI Leader Emerges

The landscape of artificial intelligence is shifting. A new model has arrived, challenging the dominance of existing systems like DeepSeek V3.

Ai2, a Seattle-based nonprofit AI research institute, unveiled its new model on Thursday. They assert that it surpasses the capabilities of DeepSeek V3, a prominent system developed by the Chinese AI company DeepSeek.

Performance and Open Source Availability

Ai2’s model, designated Tulu 3 405B, demonstrates superior performance to OpenAI’s GPT-4o on specific AI benchmarks, as indicated by Ai2’s internal evaluations.

Notably, Tulu 3 405B distinguishes itself from both GPT-4o and DeepSeek V3 through its open-source nature. This means the complete set of components required for its replication are freely accessible and permissively licensed.

U.S. Leadership in AI Development

A representative from Ai2 communicated to TechCrunch that the organization believes Tulu 3 405B highlights the United States’ potential to spearhead the global advancement of top-tier generative AI models.

The spokesperson emphasized that this achievement is a significant milestone for the future of open AI. It reinforces the U.S.’s standing as a leader in competitive, open-source models.

They further stated that the launch introduces a robust, U.S.-developed alternative to DeepSeek’s models. This represents a crucial moment in AI development, demonstrating the U.S.’s capacity to lead with competitive, open-source AI, independent of major technology corporations.

Model Size and Training

Tulu 3 405B is a substantial model, boasting 405 billion parameters.

According to Ai2, its training necessitated the parallel operation of 256 GPUs. The number of parameters generally correlates with a model’s problem-solving abilities, with larger models typically exhibiting better performance.

ai2 says its new ai model beats one of deepseek’s bestReinforcement Learning with Verifiable Rewards

Ai2 attributes a key aspect of Tulu 3 405B’s competitive performance to a technique known as reinforcement learning with verifiable rewards (RLVR).

RLVR focuses on training models using tasks with “verifiable” outcomes, such as mathematical problem-solving and instruction following.

Benchmark Results

Ai2 reports that Tulu 3 405B outperformed DeepSeek V3, GPT-4o, and Meta’s Llama 3.1 405B model on the PopQA benchmark. PopQA consists of 14,000 specialized knowledge questions derived from Wikipedia.

Furthermore, Tulu 3 405B achieved the highest performance among models in its class on GSM8K, a test comprising grade school-level math word problems.

Availability and Access

Tulu 3 405B is currently available for testing through Ai2’s chatbot web application.

The code required to train the model can be found on GitHub and the AI development platform Hugging Face. Access is available now, but the AI landscape is constantly evolving.

Stay Informed

TechCrunch offers a newsletter dedicated to AI! Subscribe here to receive it in your inbox every Wednesday.

#AI2#DeepSeek#AI model#artificial intelligence#AI comparison