OpenAI’s Math Model Hits Gold-Level Score at Global Olympiad

OpenAI’s newest experimental model has crossed an unexpected milestone. At this year’s International Math Olympiad (IMO), the system solved five out of six problems, earning a gold-level score typically reserved for elite young mathematicians. It reached 35 out of 42 possible points, placing it within the top 10 percent of over 600 contestants worldwide.

The Olympiad, first held in Romania in 1959, is considered one of the toughest math competitions. Students face two exams over two days, each lasting four and a half hours and containing three problems. These questions aren’t just about solving equations, they demand abstract reasoning, creative problem-solving, and a strong grasp of advanced algebra and pre-calculus.

AI models have been tested on math before, but usually in lower-stakes settings. Just last year, researchers were using basic arithmetic and high school problems to measure model capability. This performance suggests the bar is now higher.

Performance Under Human Conditions

OpenAI’s model tackled the same problems as the human contestants, under the same time constraints. According to researchers involved, it showed an unusual ability to focus for long stretches and craft detailed, structured solutions, something that hasn't been easy for previous language models.

Unlike DeepMind’s AlphaGeometry, which was built specifically for math, OpenAI’s system is a general-purpose language model. That makes this result more surprising. The model wasn’t tuned to master Olympiad-style problems; instead, it drew on broader training and still kept up.

Team members described it as capable of sustained reasoning, working through problems with a level of endurance and logic that pushed past previous benchmarks. According to internal commentary, the model didn’t just recall formulas or mimic surface-level patterns. It built full mathematical arguments, step by step.

Predictions That Didn't Hold

The result also turned a few expert predictions on their head. Just weeks before the competition, mathematician Terence Tao suggested that AI models might struggle to reach Olympiad standards. On a podcast appearance, he pointed to simpler contests as more realistic short-term targets.

Similar doubts had come from other corners of the tech world. In 2024, investor Peter Thiel speculated that models wouldn’t be able to solve problems at this level for at least three more years. That forecast didn’t age well.

Still, even with this breakthrough, OpenAI is not rushing to deploy the model publicly. CEO Sam Altman stated that this version won’t be released anytime soon. While upcoming systems like GPT-5 are expected to improve on current capabilities, they won’t feature this level of mathematical reasoning, at least not yet.

Reactions from Across the Field

The response has been mixed. AI researcher Alexander Wei, who helped lead the work, described the success as a major step toward more general reasoning skills in AI. But not everyone is ready to call it a turning point.

Gary Marcus, a long-time critic of AI overhype, acknowledged the performance as impressive. At the same time, he raised questions about how the model was trained, whether the IMO organizers would confirm the results, and what real-world value such systems might bring. He also asked how much it cost to reach this level of performance, and whether that kind of investment could scale.

As of now, the Olympiad’s organizers have not independently verified the model’s results. That leaves some room for scrutiny. But even without formal confirmation, the development signals how fast things are moving. A year ago, the idea of an LLM competing at this level seemed far off. Now, it’s suddenly on the scoreboard.


Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.

Read next:

• The Best and Worst U.S. States for Data Privacy in 2025

• Which Jobs Face the Highest Risk of Automation, and Which Ones Are Likely Safe?

Previous Post Next Post