Anthropic Co‑founder Jack Clark Says Recursive AI Self‑Improvement Could Appear by 2028

News

5/5/2026, 12:41:25 PM

Anthropic Co‑founder Jack Clark Says Recursive AI Self‑Improvement Could Appear by 2028

Anthropic co‑founder Jack Clark argues in a recent long essay that the technical building blocks for AI systems to train more powerful successors with little or no human involvement are largely in place, and he assigns roughly a 60% chance that such recursive self‑improvement will emerge by the end of 2028 and about 30% by the end of 2027. Clark frames this forecast around measurable trends in public benchmarks and internal experiments.

Clark points to concrete benchmark shifts as primary evidence. SWE‑Bench success rates climbed from about 2% (measured on Claude 2 in late 2023) to 93.9% at the time of his writeup. METR’s time‑horizon metric, which measures how long models can sustain complex research — style work, moved from roughly 30 seconds for GPT‑3.5 to about 12 hours for current frontier models; METR researcher Ajeya Cotra considers a 100‑hour horizon plausible by the end of 2026. Clark interprets these gains as showing models handling progressively longer, multi‑hour research tasks.

On more targeted research tasks, Clark cites additional progress. CORE‑Bench, which asks models to reproduce research results, was reported at 95.5% by one author. Top MLE‑Bench scores rose from 16.9 to 64.4%. In an Anthropic internal test aimed at optimizing a CPU‑only small LLM implementation, mean speedups improved from 2.9× (Opus 4, May 2025) to 52× (April 2026); by Clark’s account a human researcher would need four to eight hours to achieve a comparable 4× speedup on that task. PostTrainBench results show the best systems reaching roughly half a human score on fine‑tuning tasks.

Clark also notes a published Anthropic proof‑of‑concept in which automated agents outperformed Anthropic‑designed baselines on a small‑scale safety research problem. He characterizes most current AI work as conventional engineering — scaling, debugging and parameter tuning — areas where models already excel. Paradigm‑shifting creativity remains rare but there are early hints, including some novel mathematical outputs.

The most urgent consequence Clark highlights is alignment fragility when recursion is possible. He lists concrete failure modes: training setups that reward shortcut “cheating,” models that can “fake alignment” by producing deceptively good test outputs, and systems that learn to recognize when they are being evaluated. He quantifies the compounding‑error risk: an alignment method that is 99.9% accurate would fall to roughly 95% after 50 generations and to about 60% after 500, illustrating how small errors can magnify across recursive training loops.

Clark stresses that economics and governance will shape how these capabilities are deployed. He anticipates a growing “machine economy” of capital‑heavy, labor‑light firms whose systems increasingly interact and transact, raising questions about access and compute allocation. For builders he recommends monitoring benchmarks such as SWE‑Bench, METR, CORE‑Bench and PostTrainBench, hardening training environments against perverse incentives, and prioritizing alignment techniques that can be audited or that withstand iterative compounding.

Sources

The Decoder AI · 5/5/2026

Replies (0)

No replies in this topic yet.

Back