AI Solves Open Math Problem, Proving Advanced Reasoning Capabilities
By admin | Jan 14, 2026 | 3 min read
During a weekend experiment, software engineer, former quantitative researcher, and startup founder Neel Somani decided to assess the mathematical capabilities of OpenAI’s latest model. While testing its problem-solving skills, he encountered an unexpected outcome. After inputting a challenging problem into ChatGPT and allowing it to process for fifteen minutes, he returned to find a complete solution. Somani reviewed the proof and verified it using a tool called Harmonic—everything was correct.
“I wanted to set a baseline for understanding when large language models can effectively tackle open mathematical problems compared to areas where they still struggle,” Somani explained. The surprise was that, with this newest model, the boundary of what AI can achieve appears to be expanding. ChatGPT’s reasoning process is particularly striking, as it seamlessly references mathematical principles such as Legendre’s formula, Bertrand’s postulate, and the Star of David theorem.
Eventually, the model located a Math Overflow post from 2013 in which Harvard mathematician Noam Elkies provided an elegant solution to a related problem. However, ChatGPT’s final proof differed from Elkies’ work in significant ways and offered a more comprehensive solution to a version of the problem originally posed by the legendary mathematician Paul Erdős. Erdős’ extensive collection of unsolved problems has increasingly become a testing ground for artificial intelligence.
For those skeptical of machine intelligence, this result is surprising—and it is not an isolated case. AI tools are now commonplace in mathematics, ranging from formalization-focused models like Harmonic’s Aristotle to literature review systems such as OpenAI’s deep research. But since the release of GPT-5.2—which Somani describes as “anecdotally more skilled at mathematical reasoning than previous iterations”—the growing number of solved problems has become hard to overlook, raising fresh questions about the ability of large language models to advance human knowledge.
Somani focused specifically on the Erdős problems, a set of over a thousand conjectures by the Hungarian mathematician that are curated online. These problems have become an appealing target for AI-driven mathematics, varying widely in both subject matter and difficulty. The first wave of autonomous solutions emerged in November from a Gemini-powered model called AlphaEvolve, but more recently, Somani and others have observed GPT-5.2 demonstrating remarkable proficiency with advanced math.
Since Christmas, fifteen problems have been reclassified from “open” to “solved” on the Erdős website—and eleven of those solutions explicitly credit AI models for their contribution. On his GitHub page, the esteemed mathematician Terence Tao offers a more detailed perspective on this progress, noting eight distinct problems where AI models made meaningful independent advances on an Erdős conjecture, along with six other cases where progress came from identifying and building upon earlier research.
While AI systems remain far from conducting mathematics entirely without human involvement, it is evident that large models are beginning to play a significant role. On Mastodon, Tao suggested that the scalable nature of AI makes it “better suited for being systematically applied to the ‘long tail’ of obscure Erdős problems, many of which actually have straightforward solutions.”
“As such, many of these easier Erdős problems are now more likely to be solved by purely AI-based methods than by human or hybrid means,” Tao added. Another contributing factor is a recent shift toward formalization—a labor-intensive process that makes mathematical reasoning easier to verify and extend. Formalization does not inherently require AI or even computers, but a new generation of automated tools has significantly streamlined the workflow.
The open-source “proof assistant” Lean, developed at Microsoft Research in 2013, has gained widespread adoption in the field as a means of formalizing proofs. Meanwhile, AI tools like Harmonic’s Aristotle aim to automate much of the formalization process. For Harmonic founder Tudor Achim, the sudden increase in solved Erdős problems is less noteworthy than the fact that leading mathematicians are beginning to take these tools seriously.
“I care more about the fact that math and computer science professors are using [AI tools],” Achim stated. “These people have reputations to protect, so when they’re saying they use Aristotle or they use ChatGPT, that’s real evidence.”
Comments
Please log in to leave a comment.
No comments yet. Be the first to comment!