Mathematicians study ideas by proposing conjectures and proving them with theorems. Over the centuries, these proofs were carefully constructed line by line, and most mathematical researchers continue to do so today. But artificial intelligence is poised to fundamentally change that process. AI assistants nicknamed “copilots” are starting to help mathematicians develop proofs; with a real chance it will one day allow humans to answer some of the problems that are currently beyond our minds.
A promising AI co-pilot is being developed at the California Institute of Technology. It can automatically propose the next steps in a proof and help to form intermediate mathematical goals, helping to build a logical connective tissue between large steps. “If I’m developing a proof, this new co-pilot gives me multiple suggestions to move forward,” says Animashree Anandkumar, a professor of computer and mathematical sciences at Caltech. Along with his team, Anandkumar introduced the AI co-pilot a last preprinted paperwhich has not yet been peer reviewed. Crucially, he says, these suggestions “will all be correct”.
Copilot is a large language model (LLM), the same type of machine learning system behind OpenAI’s ChatGPT and Google’s Gemini. Although its training is different, it is similar to the technology that powers Google’s AlphaProof and AlphaGeometry 2, both of which created complex mathematical proofs. silver medal standard at this year’s International Mathematical Olympiad (IMO) for the world’s best high school students. And although LLMs can In the technical sense, creating what is “nonsense”. incorrect suggestions from a co-pilot are checked and rejected. In the case of the Caltech co-pilot, the AI is powered by software called Lean, which uses rigorous mathematical logic to analyze valid expressions.
About supporting science journalism
If you like this article, please consider supporting our award-winning journalism subscribe. By purchasing a subscription, you’re helping to ensure a future of impactful stories about the discoveries and ideas that shape our world.
Proof by code
In recent years Lean has become increasingly popular with a small but growing user base. Open source software allows mathematicians to access their mathematics in code, a process known as formalization. What is the advantage? It’s never wrong. In Lean and other proof-of-concept applications, the software automatically checks mathematical expressions for accuracy. That’s a world away from so-called informal mathematics, where reviewers and colleagues carefully inspect pages of such statements. This process is prone to human error, and errors are lost.
If you’re writing a proof with the help of the Caltech copilot, you can click a button to request new lines of Lean’s programming language to represent the math you’re working on. Several options, which Anandkumar calls “tactical suggestions,” will appear on the right side of the screen; then simply choose the option that suits you best. If your evidence points in a direction with obvious or known intermediate consequences, the co-pilot can also suggest how to complete that route.
“There’s no trust issue” with Lean because the software verifies the work, says Martin Hairer, professor of pure mathematics at the Swiss Federal Institute of Technology in Lausanne and Imperial College London. However, many academics have not yet accepted it. “It’s hard to use because you have to enter all the math as code,” says Hairer. Coding in Lean requires entering details that would be omitted when writing a paper, he says, so it may take more than one page of code to demonstrate what is self-evident or true.
But Hairer, who is not involved in the Caltech project, believes that AI co-pilots will eventually eliminate that tedious work. “When you present a statement that is obvious to most mathematicians, an LLM should be able to generate code for it,” he says, adding that this faster process “may attract a new generation of mathematicians to Lean.”
Anandkumar also predicts that more researchers will adopt AI-assisted formal mathematics. “Nowadays, when I talk to young mathematicians or undergraduates, I see that they know these AI systems,” he says. “They will do whatever it takes to get the job done faster and easier to gain a competitive advantage.”
Mathematical transformations
Before AI tools can be meaningfully adopted by the international mathematics community, these platforms will need to become much more powerful. With a silver medal standard at this year’s IMO, Google’s AlphaProof and AlphaGeometry 2 have shown outstanding results. But they have not yet reached the level where mathematical researchers need help with complex proofs; humans, in this respect, are still superior mathematicians.
However, “there will soon be systems approaching that level,” says David Silver, vice president of reinforcement learning at Google DeepMind. “I think this will fundamentally move human mathematicians to a place where they are able to operate at a higher level and think about ideas.” Mathematics is beginning to transform, he says, just as it did when the electronic calculator was invented. “In the pre-calculator era, there was a wide range of math that was very tedious and required a lot of effort,” he says. “I think we’re at that stage of proving it now, and in the future we’ll see very complex proofs solved automatically by AI.”
Collaboration through AI
Adopting AI co-pilots will also shake up how mathematicians work with each other. They usually work alone or in small groups. Trusted colleagues evaluate their evidence piece by piece. But formal AI assistants can empower larger teams of human collaborators to tackle larger problems by breaking them down into subproblems. Each part would be grouped together to be solved by different teams of specialized AI and human collaboration. “Mathematics is often seen as a solitary endeavor, especially in the popular media, but now it looks like AI will become an enabler of collaboration between mathematicians,” says Anandkumar.
Mathematicians are generally optimistic that AI co-pilots will soon give human experts a boost, allowing them to invest time in more complex and difficult problems. For example, AI-human collaborations in the coming years may be efforts when the Millennium Prize Challenges are difficult, perhaps P versus NPan age-old question in theoretical computer science, which asks whether every problem whose solution can be quickly verified is also quickly solved.
“That’s where we’ll find it soon,” says Silver, as he ponders the idea of solving those questions. “Complex problems like ‘P versus NP,’ or something that’s difficult, are beyond where we are today, in terms of even knowing how to begin,” he says.