OpenAI doubled GPT-5.5’s published token rates, moving input tokens to $5 per million and output tokens to $30 per million from $2.50 and $15 respectively, and an independent look at production traffic shows developers are already paying materially more. OpenRouter’s April 2026 usage logs were used to translate list‑price changes into effective $/million costs in real deployed workloads, revealing sharp variation by prompt length that will affect API spend for many builders.
OpenRouter computed average effective costs across several input‑length buckets. For inputs under 2,000 tokens, average cost rose from $4.89/M to $9.37/M (+92%); 2,000 — 10,000 tokens increased from $2.25/M to $3.81/M (+69%); 10K-25K climbed from $1.42/M to $2.15/M (+51%); 25K-50K moved $1.02→$1.65 (+62%); 50K-128K changed $0.74→$1.10 (+49%); and 128K+ rose $0.71→$1.31 (+85%). These bucketed averages show the same list‑price change maps to very different effective costs depending on workload.
Response‑length dynamics underlie much of the shift. OpenRouter reports that for inputs over 10,000 tokens, responses were 19 — 34% shorter compared with GPT‑5.4, which cut some output‑side exposure and limited cost growth for long contexts. By contrast, responses to 2,000 — 10,000 token prompts ran roughly 52% longer, and responses to short prompts under 2,000 tokens barely changed — so list‑price hikes translated almost directly into near‑doubling of effective cost for many short use cases.
A broader pricing trend is already visible across vendors. Anthropic raised Opus 4.7 prices by about 30 — 40%, citing higher token consumption, while a benchmark‑based study from Artificial Analysis (using synthetic workloads rather than production logs) reported only about a 20% increase. Both companies are publicly reported to be preparing for IPOs, and the mix of list‑rate increases and real‑world cost growth suggests rising API bills across providers.
For builders the implications are immediate: short‑prompt workloads can see effective per‑million‑token costs almost double; mid‑length prompts may incur roughly 50 — 70% higher spend; and long‑context scenarios can show mixed effects depending on whether responses shorten. Teams should re‑run cost projections on production traffic, add instrumentation to measure $/M‑token by workload, and consider batching, truncation or cheaper model fallbacks while evaluating trade‑offs between longer contexts and bigger token bills.
The OpenRouter numbers cited come from usage logs dated April 2026; the pricing comparisons use OpenAI’s published GPT‑5.4 and GPT‑5.5 token rates. The timing of the list‑rate change and the April log dataset produced the bucketed effective costs reported May 10, 2026. The discrepancy between production‑log findings and the benchmark study underscores that lab benchmarks and live traffic can yield divergent cost outcomes.
Sources
Replies (0)
No replies in this topic yet.