In April 2026 GitHub began systematically measuring token consumption in its production Agentic Workflows, using an API

News

5/8/2026, 12:01:55 AM

In April 2026 GitHub began systematically measuring token consumption in its production Agentic Workflows, using an API proxy and per-call artifacts to locate waste.

In April 2026 GitHub launched a systematic program to measure and reduce token consumption in the agentic workflows it runs as GitHub Actions, after discovering that automatically scheduled, per-pull-request agents can quietly accumulate large API bills. The effort instruments production runs for repo maintenance and CI automations and is designed to surface where LLM calls are consuming the most tokens so teams can reduce costs without changing behavior.

To collect precise metrics, GitHub routed agent calls through an API proxy that normalizes usage data across different agent frameworks, including Claude CLI, Copilot CLI, and Codex CLI. Each workflow now emits a token — usage.jsonl artifact containing one record per API call: input tokens, output tokens, cache — read tokens, cache — write tokens, model, provider, and timestamps. Those per-call artifacts, combined with workflow logs, produce a historical, per-workflow view of token spend.

GitHub added two daily agentic workflows to act on that telemetry. A Daily Token Usage Auditor aggregates recent token artifacts, ranks workflows by consumption, flags significant increases, and surfaces anomalous runs-for example, a job that normally completes in four LLM turns taking 18. When the Auditor identifies a target, a Daily Token Optimizer inspects the workflow source and logs and automatically opens a GitHub issue that details concrete inefficiencies and proposes specific fixes.

The Optimizer most often flagged unused MCP tool registrations. Many agent runtimes include full MCP tool function names and JSON schemas in every request, which causes the complete tool manifest to become part of each call’s context. For a GitHub MCP server with 40 tools, those schemas can add roughly 10 — 15 KB per turn, even when the agent actually invokes only two tools — so the remainder becomes repeated overhead on every call.

Applying the Optimizer’s recommendations in smoke tests produced measurable savings: removing unused tools from the MCP configuration reduced per-call context size by about 8 — 12 KB and saved several thousand tokens per run without changing observable workflow behavior. The Auditor also helped surface individual runs with abnormal turn counts and identified the workflows that contribute most to aggregate spend across the estate.

For builders, GitHub’s experience implies two practical actions: instrument repeatable agentic jobs at the API-call level (YAML-driven workflows make this tractable) and prune tool manifests to avoid passing unnecessary schemas with each request. GitHub notes that the Auditor and Optimizer are themselves agentic and appear in the daily reports, creating a small virtuous cycle of measurement and automated remediation.

Sources

GitHub Generative AI · 5/7/2026

Replies (0)

No replies in this topic yet.

Back