Timothy Gowers says ChatGPT 5.5 Pro produced PhD‑level number‑theory papers in under two hours

News

5/11/2026, 2:26:52 AM

Timothy Gowers says ChatGPT 5.5 Pro produced PhD‑level number‑theory papers in under two hours

Fields Medalist Timothy Gowers reports that OpenAI’s ChatGPT 5.5 Pro, given published number‑theory problems, generated PhD‑level arguments and LaTeX preprints in under two hours, including improving an exponential bound to a polynomial one.

Timothy Gowers, a Fields Medalist and Combinatorics Chair at the Collège de France, says OpenAI’s ChatGPT 5.5 Pro produced PhD‑level mathematical research on open problems in number theory and delivered polished LaTeX drafts in under two hours — a sequence of outputs he describes as requiring no mathematical guidance from him. If verified, the episode suggests the practical bar for human contribution now shifts toward producing results that large language models cannot replicate.

Gowers supplied the model with published problems from a paper by number theorist Mel Nathanson about sizes and constructions of sets defined by integer sums. He reports receiving complete arguments and LaTeX drafts generated by the model; Gowers says his own mathematical input on the work was zero and that he checked correctness himself before posting the resulting preprints.

On one Nathanson problem where the published work gave only an exponential bound and asked whether it could be improved, ChatGPT 5.5 Pro returned, after 17 minutes and 5 seconds, a construction achieving a quadratic bound — which Gowers describes as the best possible for that case. The model then produced a LaTeX preprint in 2 minutes and 23 seconds documenting the construction and argument.

A separate, more general problem had prior work by MIT student Isaac Rajagopal showing an exponential dependency. After Gowers provided Rajagopal’s paper, the model produced an initial improvement in 16 minutes and 41 seconds that Rajagopal called a routine modification. When pushed further, the model reported additional progress after 13 minutes and 33 seconds but flagged two technical statements; it then performed 9 minutes and 12 seconds of internal checking, and the finished preprint was ready in 31 minutes and 40 seconds.

The most striking technical claim is that the model transformed a previously exponential bound into a polynomial one by finding a way to compress algebraic structures into smaller number ranges while preserving the combinatorial properties in question. An MIT researcher involved described the key idea as "completely original," and Rajagopal judged the final results "almost certainly correct" and the crucial insight "quite ingenious." Gowers emphasizes both the speed and the independence of the process: model proposals, stepwise checks and rapid LaTeX drafting all occurred in under two hours and, he says, required no clever prompting. Coauthors noted varying levels of novelty across steps, and Gowers posted the outputs as preprints after his own verification.

For builders and researchers, the episode demonstrates that current large language models can produce nontrivial, verifiable mathematical arguments and polished drafts rapidly. The report links concrete timings, cited paper inputs and a repeatable workflow — model proposes an idea, drafts a proof, flags technical gaps, performs internal checks and outputs a preprint ready for human scrutiny — but it also underscores that careful human verification and formal peer review remain necessary.

Sources

The Decoder AI · 5/9/2026

Replies (0)

No replies in this topic yet.

Back