From Silver to Gold in Twelve Months
Last year, DeepMind’s AlphaGeometry and AlphaProof systems collectively solved four of six IMO problems, earning a silver-medal equivalent. In July 2025 the research team leap-frogged that result: an advanced version of Gemini running in “Deep Think” mode solved five of six tasks for 35 points—crossing the 2025 gold-medal threshold and setting a new AI milestone.
International coordinators graded Gemini’s written solutions using the same rubric applied to student competitors. According to IMO President Gregor Dolinar, the proofs were “clear, precise, and, in several cases, easy to follow”.
What Makes Deep Think Different?
Technique | Purpose | Impact on Performance |
---|---|---|
Parallel Thinking | Explores multiple proof avenues simultaneously, then merges the strongest ideas. | Avoids dead-end, single-thread chains of thought. |
Reinforcement-Learning Fine-Tune | Trains on curated theorem-proving and problem-solving data with reward signals for conciseness and rigor. | Raises success rate on multi-step reasoning challenges. |
High-Quality Solution Corpus | Ingests expertly written IMO proofs plus heuristic “tips & tricks.” | Gives the model stylistic and structural templates for clearer presentation. |
Benchmark Significance
-
35 / 42 points → comparable to a top-25-percent human gold medalist.
-
Perfect scores on five problems; only one combinatorics task eluded the model.
-
Order-of-magnitude speed-up vs. AlphaGeometry 2 + AlphaProof, which needed days of inference in 2024.
While specialized theorem solvers have mastered narrow domains, Gemini Deep Think is a general LLM—capable of chat, code, and multimodal tasks—now showing elite mathematical reasoning.
Broader Implications
-
Curriculum Design for AI
Gemini’s success underscores the value of domain-targeted reinforcement learning on top of large-scale pre-training. -
Parallel Thinking as a New Primitive
Instead of a single “chain of thought,” future models may default to branch-and-merge reasoning, akin to how human teams brainstorm proofs. -
Human–AI Collaboration
DeepMind notes the technique could become a “proof assistant” for mathematicians—surfacing lemmas or counter-examples at gold-medal quality within minutes. -
Educational Outreach
Publishing the solutions provides a free study resource for aspiring IMO contestants and teachers, potentially leveling the global playing field.
Limitations & Next Steps
-
Interpretability: Despite clearer written proofs, the internal decision tree remains opaque—researchers are now probing why certain branches survive the merge.
-
Generalization: Performance on under-represented areas (e.g., functional equations) still lags; future training will widen topic coverage.
-
Trust & Verification: Formal proof checkers like Lean are being integrated to machine-verify each Gemini output before publication.
DeepMind plans to open selected Deep Think capabilities via its Gemini API later this year, with safeguards to prevent misuse in academic competitions.
Key Takeaway
Gemini Deep Think’s gold-medal performance doesn’t just raise the bar for AI mathematics—it redefines what general-purpose language models can achieve when armed with structured parallel reasoning and tailored RL training. The achievement brings researchers a step closer to AI systems that can tackle longstanding open problems and act as partner mathematicians rather than mere calculators.