
DeepSeek Math V2: The Open-Source Reasoner Achieving Gold-Level IMO Performance
Overview
AI reasoning has taken an exciting turn. DeepSeek, a Chinese research lab that previously astonished the industry with its affordable, high-performance models, has unveiled DeepSeek Math V2—a specialized open-source math reasoner. This model claims to achieve gold-level performance on 2025 International Mathematical Olympiad (IMO) problems, nearly matching Google DeepMind’s Gemini Deep Think on rigorous proof benchmarks, and scoring an impressive 118 out of 120 on the 2024 Putnam exam. Unlike proprietary alternatives, DeepSeek is providing weights, code, and a training recipe for the community to explore, fine-tune, and enhance the model.
This development is significant for more than just accolades. Making high-quality mathematical reasoning available for download has the potential to accelerate advancements in engineering, science, and safety research. However, it also presents new challenges regarding verification, training data integrity, and the responsible use of such systems.
In this article, we delve into what has been released, its performance, the inner workings of the model, comparisons to Google and OpenAI, and best practices for safe usage.
What DeepSeek Released
- Model Family: DeepSeek Math V2 is built on the DeepSeek V3.2 experimental base, featuring a 685 billion parameter MoE configuration according to its Hugging Face card. The repository includes a brief paper, figures, and outputs. It operates under the Apache 2.0 license for code and model usage.
- Training Recipe: The training process involves a verifier to assess proof steps, which is then utilized as a reward model to teach a generator how to produce rigorous proofs that can withstand scrutiny. This generator-verifier loop is designed to ensure verification stays one step ahead, focusing on detailed, step-by-step rigor.
- Release Artifacts: Available on GitHub is the paper along with outputs, and there’s a Hugging Face listing for downloads and a quick start. Be aware that reproducing their top results will require substantial computational resources due to the system’s demands at test-time.
Performance Highlights
DeepSeek boasts remarkable results:
- IMO 2025 Problems: The model successfully solved 5 out of 6 problems, qualifying as gold-level according to competition grading. DeepSeek categorizes this as achieving gold-level performance based on their evaluations.
- Putnam 2024: Achieved 118 out of 120, with nearly full scores on most problems, particularly when employing significant computational resources.
- CMO 2024: Met gold-level standards.
- IMO-ProofBench: Scored 61.9% on the Advanced Proof set and around 99% on the Basic set, placing it in close proximity to Google’s Gemini Deep Think.
It’s essential to note the benchmarking context. Google’s DeepMind has documented official graded results for its Gemini Deep Think model, which achieved gold-level performance under natural language constraints. In contrast, DeepSeek’s IMO claims are documented in its own literature rather than through official grading, which, while ambitious, calls for careful interpretation of the results.
Comparison with Google and OpenAI
- Google DeepMind: Gemini Deep Think not only reached gold-level performance by solving 5 out of 6 problems but also did so within the competition’s time limits and fully in natural language—a notable advancement from previous formal-language approaches.
- ProofBench Scores: On the Advanced IMO-ProofBench, Gemini Deep Think is rated at approximately 65.7%, while more generic models are at lower rates. DeepSeek Math V2’s score of 61.9% highlights its competitive position within this stringently evaluated framework.
Overall, this data suggests that specialized models like Deep Think can vastly improve performance in complex mathematical tasks, and open-source models like DeepSeek are making strides in this domain through rigorous verification methods.
Why This is a Breakthrough
- Open Access to Advanced Reasoning: While Google and OpenAI have demonstrated what proprietary systems can achieve, DeepSeek is pioneering by releasing an open-source package that performs comparably, paving the way for reproducibility and independent evaluations.
- Emphasis on Proofs Rather than Just Answers: DeepSeek’s methodology prioritizes self-verification and correctness of proofs, which aligns more closely with traditional mathematical practices compared to other benchmarks.
- Community-Driven Innovation: With publicly available weights and resources, researchers are equipped to experiment with various verification objectives, enhance test-time scaling, and integrate the model into existing formal systems.
Thank You for Reading this Blog and See You Soon! 🙏 👋
Let's connect 🚀
Latest Insights
Deep dives into AI, Engineering, and the Future of Tech.

DeepSeek Math V2: The Open-Source Reasoner Achieving Gold-Level IMO Performance
DeepSeek Math V2 claims gold-level IMO performance and near-perfect Putnam results. See how its verifier-generator loop works and how it compares to Gemini.
Read Article


