AI could radically change how math proofs are verified

nymnympseudonym@piefed.social · 28 days ago

AI could radically change how math proofs are verified

nymnympseudonym@piefed.social · 28 days ago

A growing AI specter The problem with large language models and math, to date, has largely been one of accuracy. To be fair, LLMs like those that power ChatGPT and Anthropic’s Claude are better at math problems than anyone expected, and they have improved with new iterations. But they’re not perfect.

“If you go to ChatGPT and ask it to prove a theorem, it spits out a text,” Riehl says. It might sound good and look good and use correct terms, she says. “But there’s nothing in the way that large language models are designed to guarantee that [it’s] correct.” That’s because they’re designed to respond to queries using probability and are not prioritizing accuracy. And even if it is 99 percent correct, she says, that’s not good enough for a math proof. When combined with a theorem prover like Lean, though, LLMs get much better.

Last July, the AI company Harmonic made headlines after its program Aristotle, which uses Lean to verify and refine its work, scored high enough for a gold medal, the highest prize, in the annual International Mathematical Olympiad. During this two-day event, participants, all under the age of 20, work through six exceptionally difficult problems. More than 600 human contestants entered the 2025 contest held in Queensland, Australia; 72 scored at least 35 out of a possible 42 points, earning a gold medal. In addition to Aristotle, AI programs used by Google and OpenAI similarly carried out gold medal–level work.

Some mathematicians didn’t see the olympiad accomplishments as showing anything meaningful about the way math is actually done. But more interesting results soon emerged. In July, Rutgers’ Kontorovich and Terence Tao, a UCLA mathematician and Fields Medalist, announced that progress on their 18-month effort to formalize something called the strong prime number theorem had slowed. But then in September, a company called Math, Inc., supported by a grant from the DARPA expMath project, announced that it had used its program, called Gauss, to finish the task in just three weeks.

Gauss combined Lean with AI language models to autoformalize the remainder of the proof — that is, the AI program translated definitions and arguments into Lean, which checked the entire argument for accuracy. More recently, in January, researchers reported using Aristotle and GPT-5.2 to generate, formalize and verify a proof of a problem posed by prolific Hungarian mathematician Paul Erdős in 1975. This is the latest in a recent string of proofs of Erdős problems that used AI in some way.

So far, Buzzard greets advances like these with skepticism. Right now, there are no guardrails, he says. And even though Lean reports that AI-generated code is accurate, it may not actually represent the theorem that the mathematician thought they were proving.

At the same time, Buzzard admits that the picture could change quickly given the rapid speed of AI advancement. So far, he hasn’t seen any AI advances that would help him in his work. But he allows that it’s possible in five years that some tool could emerge that would make short work of formalizing the proof of Fermat’s last theorem. “I do wonder whether autoformalization will get to the point where it will just, you know, be able to eat the literature,” Buzzard says.

nymnympseudonym@piefed.social · 28 days ago

Helping humans Many mathematicians predict that humans will always be necessary in math, but because of the use of AI and formalization, their role could change dramatically.

“The problem-solving aspect of mathematics will basically vanish,” says mathematician and computer scientist Christian Szegedy of Math, Inc. He previously helped develop Google DeepMind’s AlphaProof program and co-led the Elon Musk–founded company xAI. The new job of humans in math, he says, will be “to steer the exploration of mathematics to the areas that we actually care about,” rather than muddling through the logic and fine details of a proof. He sees the rise of AI-driven autoformalization as a way toward creating a digital, brilliant assistant.

An image of a man, shaded in blue. “If we digitize mathematics, maybe at some point it will turn math on its head.” — Kevin Buzzard ANGUS/IMPERIAL COLLEGE LONDON Szegedy thinks real progress will be marked by AI’s ability to reason in new and creative ways. He predicts that AI systems will achieve “superhuman intelligence” in math — being able to solve problems that humans can’t — this year. So far, that hasn’t happened.

Szegedy also predicts that at some point, AI models will be better at formalizing proofs than humans, which doesn’t seem out of reach given the fast pace of development in 2025. Soon, he thinks, the models will be able to create a proof from scratch. “And then, the game is over.” He doesn’t think humans will be out of the game; he means that the essential role of the mathematician will be purely creative, relying on an AI collaborator to work out the details.

DARPA’s Shafto, who leads the expMath project, sees the changes as giving mathematicians more time and space to think about ideas rather than details. “If you talk to mathematicians, of course, yes, they prove things and want them to be correct, but that’s not what they’re doing most of the time,” he says. “They’re talking about ideas and how they relate and what might work. Many of them would be happy to have a student or collaborator whom they could trust to sort of prove their tiny lemmas for them.”

Others in the field, though, eye the coming AI wave with skepticism and concern for the future. “Many of my colleagues have absolutely no interest in it,” says mathematician Aravind Asok at the University of Southern California in Los Angeles.

In recent years, Asok says, AI companies have recast mathematical accomplishment as a tool of legitimization. Math itself, he says, becomes a problem to be solved. He finds that notion misguided and “a complete misapprehension of what mathematics is.” The insistences that math can be solved by the abilities of AI models, or that the primary goal is accuracy, require a narrow view of the field.

But it’s a view that has already infiltrated his classroom: Asok says he no longer assigns homework because too many of his graduate students use AI to generate answers. That defeats the purpose. “They need to struggle and engage with [the work] in a way to really build up their own intuitions,” he says. But it’s much faster to ask ChatGPT.

Asok worries that conversations around AI and math focus too closely on correctness. That’s important, he says, “but making mistakes is part of learning.” There have been plenty of mistakes, he adds, that have helped the field of research mathematics move forward.

Formalization is a powerful tool that could help push math in interesting directions, but Asok worries that if students learn math as something to be done with AI, then tomorrow’s mathematicians will lack the creativity needed to find truly new frontiers. “It’s like saying that there’s only one way to have music, or only one way to talk in a conversation,” he says.

Asok also worries that AI may be a threat to the profession because of how progress is perceived. Mathematicians often rely on federal funding, he says, and if the U.S. government adopts the narrative that math itself has been solved by AI companies, support for new work and new ideas could wane. The teaching of math, he says, might be offloaded to AI agents and programs. “I feel like the professional status of mathematicians could change immensely.”

Buzzard maintains that, with or without AI, formalization can help bring math and math education into a modern age. Mathematicians would benefit from an interactive theorem prover with access to verified mathematical information not only to check their work, but also as a proving ground for new AI-generated work, in part to separate sloppy code from bona fide advances.

“I just want to make my colleagues’ lives better,” Buzzard says. “I’m not trying to destroy them. I’m actually trying to help them.”

[END]