Google DeepMind's Gemini 3.5 Flash is faster and more capable but costs 5.5× more in benchmarks

News

5/20/2026, 2:07:01 PM

Google DeepMind's Gemini 3.5 Flash is faster and more capable but costs 5.5× more in benchmarks

Gemini 3.5 Flash delivers more than 280 output tokens per second and a 1,000,000‑token context window, but Artificial Analysis finds it runs about 5.5× the benchmark cost of Gemini 3 Flash after Google raised per‑token prices.

Google DeepMind has released Gemini 3.5 Flash, a throughput‑focused model that, in early tests, significantly raises agent speed and context capacity but also sharply increases operating costs. Artificial Analysis reports the model produced over 280 output tokens per second while supporting a 1,000,000‑token context window, making it the fastest model in its class in those benchmarks. The same analysis concluded that overall run costs jumped substantially compared with prior Flash releases.

The pricing changes are concrete and sizable. Google now charges $1.50 per million input tokens and $9.00 per million output tokens for Gemini 3.5 Flash, up from $0.50 and $3.00 on Gemini 3 Flash. While those per‑token rates remain below Gemini 3.1 Pro’s $2.00/$12.00, Artificial Analysis found that typical token consumption on real tasks drives total operating costs far higher, producing roughly a 5.5× cost increase versus Gemini 3 Flash in their benchmark suite.

Agent workloads are the primary driver of that cost gap. In Artificial Analysis’s agent tests, Gemini 3.5 Flash required many more interaction steps and consumed many more tokens per session than previous Flash models, which pushed aggregate benchmark costs above Gemini 3.1 Pro by about 75%. The model delivers its largest gains on agentic and multimodal benchmarks, but those same interaction patterns amplify spending compared with models that use fewer tokens per task.

On capability metrics, Gemini 3.5 Flash shows measurable gains but retains notable weaknesses. Artificial Analysis scored it 55 on its Intelligence Index, nine points higher than Gemini 3 Flash, and it improved by 11 points on the AA Omniscience metric. The report also found the model’s hallucination rate fell to 61%, a 31 percentage‑point reduction from the prior Flash release. Even so, these improvements still leave Gemini 3.5 Flash behind the best performers on hallucination and answer accuracy in the same tests.

The price and consumption dynamics mirror broader industry trends noted in the analysis. Anthropic’s Opus 4.7 exhibited an effective 30 — 40% cost increase over its predecessor because of higher token consumption, while OpenAI’s GPT‑5.5 posted a 50 — 90% cost rise versus 5.4 driven mainly by higher base prices even as token usage fell. Google’s 3.5 Flash stands out by increasing both per‑token prices and typical token consumption simultaneously.

For builders and purchasers the practical lesson is that headline per‑token pricing no longer captures real efficiency. Higher throughput and stronger agent capabilities may justify Gemini 3.5 Flash for latency‑sensitive or complex agent applications, but teams should instead measure tokens consumed per end‑to‑end task and compare total run costs across models. That is especially important for programming workloads, where Artificial Analysis found Gemini 3.5 Flash trailing competitors such as GPT‑5.5 and Claude Opus 4.7.

Sources

The Decoder AI · 5/20/2026

Replies (0)

No replies in this topic yet.

Back