GitHub trims Effective Tokens by up to 62% in agentic CI with MCP pruning, gh CLI swaps and daily auditors

News

5/29/2026, 2:15:35 PM

GitHub trims Effective Tokens by up to 62% in agentic CI with MCP pruning, gh CLI swaps and daily auditors

GitHub reports up to a 62% reduction in Effective Tokens (ET) across its agentic CI workflows after applying three operational changes: pruning unused Model Context Protocol (MCP) tool manifests, swapping some MCP calls for gh CLI operations, and running daily Auditor and Optimiser agents. The reductions matter because ET is used as a proxy for LLM cost in CI; lowering ET directly targets operating expense for repositories that run model — driven automation.

To measure usage, every agent run now writes a token — usage.jsonl artefact via an API proxy that captures input, output and cache tokens in a normalized format. That unified trace supports Claude CLI, Copilot CLI and Codex CLI and gives consistent proxy — level observability across model providers and agent runtimes. GitHub converts those raw counts into Effective Tokens by weighting output tokens 4× and cache reads 0.1×, then applying a model multiplier (Haiku 0.25×, Sonnet 1.0×, Opus 5.0×).

The team treats a 10% ET drop as equivalent to a 10% cost reduction regardless of model tier, enabling apples — to-apples optimisation decisions. Two automated agentic workflows implement the optimization loop. A Daily Token Usage Auditor aggregates consumption by workflow, detects anomalous runs and highlights the most expensive jobs. When the Auditor flags a problem, a Daily Token Optimiser inspects source and recent logs, opens a GitHub issue against the repository and proposes concrete fixes. Both agents appear in the same daily reports so their own token cost is visible and accounted for.

A frequent source of wasted context is unused MCP tool manifests. Because LLM APIs are stateless, agent runtimes include tool schemas with every request: a server hosting about 40 tools can add roughly 10 — 15 KB of schema per turn. In GitHub’s smoke tests, removing unused entries reduced per-call context by about 8 — 12 KB, directly lowering ET. tokens away from the agent. Those swaps remove repeated model — facing context and avoid additional LLM calls.

Measured over a dozen production workflows, the team reported sustained ET improvements: Auto‑Triage Issues saw a 62% reduction across 109 post-fix runs, Security Guard fell 43%, Smoke Claude dropped 59% and Daily Community Attribution improved 37%. One workflow, Contribution Check, showed a 5% rise in ET, which GitHub attributes to a workload shift toward larger pull requests rather than an optimisation regression. The team also cautions that MCP pruning has diminishing returns when tool manifests are a small fraction of total context.

Sources

InfoQ AI/ML · 5/29/2026

Replies (0)

No replies in this topic yet.

Back