AlphaProof Nexus autonomously proves nine Erdős problems at a few hundred dollars each

News

5/25/2026, 11:07:54 AM

AlphaProof Nexus autonomously proves nine Erdős problems at a few hundred dollars each

AlphaProof Nexus, a research framework from DeepMind, autonomously produced machine — checkable proofs for nine of the 353 open Erdős problems it attempted, including two problems that had remained unsolved for 56 years. The system also proved 44 of 492 OEIS conjectures, settled a 15 — year-old question about Hilbert functions in algebraic geometry and improved a known bound in convex optimization. The reported inference cost was only a few hundred dollars per solved problem, underscoring potential cost-efficiency for targeted automated theorem proving.

The framework combines the Gemini 3.1 Pro language model with the Lean proof assistant. In operation, the LLM emits proof steps directly in Lean syntax and the Lean compiler type-checks and verifies each step. Compiler error messages are fed back into the LLM to guide subsequent generation attempts, creating an automated loop that produces machine — checkable proofs; human researchers intervene only at the final verification stage to confirm and validate results.

AlphaProof Nexus was tested with four agent variants, labeled A through D. Agent A implements the simplest loop (LLM → Lean → compiler feedback). Agent B adds AlphaProof, a reinforcement — learning module designed for olympiad — style reasoning. Agent C introduces an evolutionary layer in which sub-agents share a population of proof sketches; rating agents built on Gemini 3.0 Flash score and rank candidates using an Elo system. Agent D combines all of these capabilities and was the variant used in the primary Erdős experiments.

A post-hoc analysis produced a notable finding: the simple Agent A was able to eventually generate proofs for all nine solved Erdős problems as well, though it required more attempts and greater compute on the hardest instances. Performance nevertheless varied by problem. Instances such as erdos_26 showed high success across multiple agents, while harder tasks like erdos_125 and erdos_152 revealed gaps where Agent D reached solutions with fewer attempts and lower compute effort.

The experiments highlight two practical lessons for builders. First, grounding LLM reasoning with symbolic compiler feedback substantially improves logical reliability and produces proofs that are machine checkable. Second, increasingly capable LLMs may shift preferences away from heavily specialized systems toward simpler agentic loops that rely on iterative feedback. These effects were observable across the agent variants and influenced which design choices yielded the fastest solves on particular problems.

Despite these gains, the approach remains limited in scope. The overall solve rate on the attempted Erdős problems was about 2.5 percent, and most open problems remained out of reach. Achieving cost-effective, general automation for deep mathematical discovery therefore remains an open engineering challenge, even as targeted, verifiable progress is now demonstrably attainable.

Sources

The Decoder AI · 5/25/2026

Replies (0)

No replies in this topic yet.

Back