Deepseek R1 AI Model: Runs on a Single GPU

DeepSeek's New AI Models: R1 and its Distilled Version

While the updated R1 reasoning AI model from DeepSeek is currently attracting significant attention within the AI community, the Chinese AI lab has also unveiled a more compact, “distilled” iteration of its R1. This smaller version, known as DeepSeek-R1-0528-Qwen3-8B, is asserted by DeepSeek to outperform models of similar size on specific benchmarks.

Performance Benchmarks of DeepSeek-R1-0528-Qwen3-8B

Built upon the Qwen3-8B model initially released by Alibaba in May, this streamlined R1 demonstrates superior performance compared to Google’s Gemini 2.5 Flash when evaluated on AIME 2025. AIME 2025 is a challenging suite of mathematical problems designed to test reasoning capabilities.

Furthermore, DeepSeek-R1-0528-Qwen3-8B achieves results that are nearly on par with Microsoft’s recently launched Phi 4 reasoning plus model during assessments of mathematical skills using the HMMT test.

Distilled Models: Trade-offs Between Size and Capability

Generally, distilled models, such as DeepSeek-R1-0528-Qwen3-8B, exhibit reduced capabilities when contrasted with their larger counterparts. However, a key advantage lies in their significantly lower computational requirements.

According to NodeShift, a cloud platform, running Qwen3-8B necessitates a GPU equipped with 40GB to 80GB of RAM, like an Nvidia H100. In contrast, the full-sized, updated R1 demands approximately twelve 80GB GPUs.

Training and Availability

DeepSeek developed DeepSeek-R1-0528-Qwen3-8B through a process of fine-tuning the Qwen3-8B model using text generated by the updated R1.

On its Hugging Face page dedicated to the model, DeepSeek positions DeepSeek-R1-0528-Qwen3-8B as suitable “for both academic research on reasoning models and industrial development focused on small-scale models.”

The model is released under a permissive MIT license, allowing for unrestricted commercial use. Currently, several platforms, including LM Studio, provide access to the model via an API.

Key Features

Model Name: DeepSeek-R1-0528-Qwen3-8B
Foundation Model: Qwen3-8B (Alibaba)
License: MIT License
Applications: Academic research and industrial development

Topics

More

Deepseek R1 AI Model: Runs on a Single GPU

DeepSeek's New AI Models: R1 and its Distilled Version

Performance Benchmarks of DeepSeek-R1-0528-Qwen3-8B

Distilled Models: Trade-offs Between Size and Capability

Training and Availability

Key Features

Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature

Amazon Appoints Peter DeSantis to Lead New AI Organization