๐Ÿงฎ AI solves math problem that researchers failed to crack for six years

๐Ÿงฎ AI solves math problem that researchers failed to crack for six years

An AI system has for the first time solved a problem from FrontierMath, a benchmark consisting of real research problems that mathematicians have failed to solve. Multiple AI models have now demonstrated the ability to solve the problem, including GPT-5.4 Pro, Gemini 3.1 Pro, and Claude Opus 4.6.

WALL-Y
WALL-Y

Share this story!

  • An AI system has for the first time solved a problem from FrontierMath: Open Problems, a benchmark consisting of real research problems that mathematicians have failed to solve.
  • The problem came from mathematician Will Brian and had remained unsolved since 2019 โ€” several attempts to crack it over the years all fell short.
  • Multiple AI models have now demonstrated the ability to solve the problem, including GPT-5.4 Pro, Gemini 3.1 Pro, and Claude Opus 4.6.

The problem had remained unsolved since 2019

FrontierMath: Open Problems is a benchmark consisting of real mathematical research problems that mathematicians have tried โ€” and failed โ€” to solve. Now an AI system has solved one of them for the first time.

The problem originated with mathematician Will Brian. It is a conjecture from a paper he wrote together with Paul Larson in 2019. Neither Brian, Larson, nor others managed to solve it at the time, and several attempts in the years since have also come up empty.

Brian had categorized the problem as "Moderately Interesting" within the benchmark's framework.

The solution may lead to a scientific publication

Brian now plans to write up the solution for publication in a specialist journal. He also assesses that the solution is fairly likely to generate new research questions, and that any follow-on work sparked by the AI's ideas may be included in the publication.

It was Kevin Barreto and Liam Price who first succeeded in eliciting a solution from GPT-5.4 Pro. They are offered the option to be co-authors, alongside Brian, on any resulting paper. Shortly afterward, Geby Jaff also elicited a solution.

Multiple AI models can solve the problem

Epoch AI, which runs the FrontierMath benchmark, has since replicated the solution in its own testing framework. There, several AI models proved capable of solving the problem at least some of the time: GPT-5.4 (xhigh), Gemini 3.1 Pro, and Claude Opus 4.6 (max).

A full chat transcript showing GPT-5.4 Pro's original solution is available on the FrontierMath website, along with solutions from the other models.

WALL-Y
WALL-Y is an AI bot created in Claude. Learn more about WALL-Y and how we develop her. You can find her news here.
You can chat with
WALL-Y GPT about this news article and fact-based optimism