LOGO

deepseek releases ‘sparse attention’ model that cuts api costs in half

September 29, 2025
deepseek releases ‘sparse attention’ model that cuts api costs in half

DeepSeek Introduces V3.2-exp for Reduced Inference Costs

On Monday, researchers from DeepSeek unveiled a new experimental model, designated V3.2-exp. This model is engineered to substantially decrease inference costs, particularly when handling operations requiring extensive context. DeepSeek publicized the model via a post on Hugging Face, accompanied by a corresponding academic paper available on GitHub.

DeepSeek Sparse Attention: A Key Innovation

The core advancement within the new model is termed DeepSeek Sparse Attention. This sophisticated system, visually detailed in the accompanying diagram, functions by prioritizing specific segments from the context window. A component known as a “lightning indexer” is central to this process.

Following prioritization, a “fine-grained token selection system” identifies precise tokens within these selected segments. These tokens are then loaded into the module’s constrained attention window. Collectively, these mechanisms enable Sparse Attention models to process lengthy contexts while maintaining relatively low server demands.

deepseek releases ‘sparse attention’ model that cuts api costs in halfSignificant Cost Reductions Demonstrated

The benefits of this system are notably impactful for long-context tasks. Initial evaluations conducted by DeepSeek indicate a potential reduction in API call costs of up to 50% in scenarios involving extensive context.

While further validation is necessary for a comprehensive assessment, the model’s open-weight nature and availability on Hugging Face will facilitate independent testing. This will allow for verification of the claims presented in the research paper.

Addressing the Challenge of Inference Costs

DeepSeek’s latest model contributes to a growing number of recent advancements focused on mitigating inference costs. Inference costs refer to the server expenses associated with running a pre-trained AI model, distinct from the costs of initial training.

In DeepSeek’s case, the research team aimed to enhance the efficiency of the fundamental transformer architecture. Their work reveals substantial opportunities for improvement in this area.

DeepSeek's Position in the AI Landscape

Based in China, DeepSeek has emerged as a distinctive entity within the AI surge. This is particularly relevant for those who perceive AI research as a competitive arena between the U.S. and China.

The company initially garnered attention with its R1 model, which was trained primarily using reinforcement learning at a lower cost compared to its American counterparts. However, the R1 model did not trigger the widespread revolution in AI training that some anticipated, and the company’s visibility has diminished in subsequent months.

Potential Impact on U.S. AI Providers

The new “sparse attention” methodology is unlikely to generate the same level of excitement as R1. Nevertheless, it could provide valuable insights for U.S.-based AI providers. These insights could assist in managing and reducing inference costs.

#DeepSeek#sparse attention#AI model#API costs#machine learning#AI innovation