
Peter Steinberger’s small OpenClaw team uses roughly 100 Codex model instances to automate code writing, PR review and security fixes; the experiment logged 603 billion tokens, 7.6 million requests and a $1.3 million 30‑day OpenAI bill.
Peter Steinberger is running a production‑scale experiment that uses about 100 Codex model instances to automate substantial parts of software development for the open‑source project OpenClaw. A core team of roughly three people manages priorities and oversight while autonomous agents write code, review pull requests and open fixes automatically, a setup that shifts humans toward steering rather than authoring every change. The trial demonstrates how agentized workflows can speed iteration but impose high compute costs.
The agents handle a wide range of routine engineering tasks: reviewing pull requests, locating security holes in commits, deduplicating incoming issues, writing fixes and opening PRs guided by the project’s stated vision. Some agents continuously monitor benchmarks and post regression alerts in Discord; others listen to team meetings and proactively create pull requests for discussed features. The deployment integrates Codex‑driven agents with security tooling such as Clawpatch.ai, Vercel’s Deepsec and Codex Security, while chat channels and CI‑style benchmark monitors close the loop on regressions and triage.
Operational metrics show the experiment at nontrivial scale. Over a 30‑day period the OpenAI API bill reached $1.3 million, covering 603 billion tokens and 7.6 million requests, and Steinberger reported GPT‑5.5 as the top model used. He also said OpenAI is covering the bill for this work, removing a direct cost burden from his team for the trial. Steinberger noted that a single setting change — turning off a “Fast Mode” option — could reduce immediate costs by roughly 70 percent.
Steinberger framed the spend as a deliberate research investment to see what software development looks like when token costs are not a constraint. He emphasized that the project’s artifacts are open source and compatible with leading and open models, and when asked about return on investment he concluded, “I’d say pretty high.” Those artifacts are intended to let other builders replicate and adapt the agent‑driven pipeline.
The experiment highlights concrete trade‑offs teams must weigh when agentizing development: operational scale and visibility into costs versus faster iteration enabled by agents that proactively open PRs and fix issues. The provided data points — instance count, token usage, request volume, model choice and the option to throttle speed modes — give builders tangible knobs to tune when designing their own agent‑driven pipelines and estimating budget impact.
Sources
Replies (0)
No replies in this topic yet.