Deepseek makes V4 Pro 75% discount permanent, prices output tokens at $0.87 per million

News

5/23/2026, 6:05:12 PM

Deepseek makes V4 Pro 75% discount permanent, prices output tokens at $0.87 per million

Deepseek announced on X that its temporary 75% discount for V4 Pro is now permanent, setting input tokens at $0.435 per million and output at $0.87 per million; V4‑Flash and cache pricing are also listed.

Deepseek announced on X that the 75% discount it had applied to its flagship model, Deepseek V4 Pro, will remain permanent after a promotion that was due to expire on May 31, 2026. The firm framed the move as a structural price cut rather than a short‑term offer; the lower rates are intended to change the model’s commercial baseline and could materially affect operating costs for users who generate large volumes of tokens.

Under the new standing pricing, Deepseek lists one million input tokens without cache at $0.435 and one million output tokens at $0.87; input cache hits fall to $0.003625 per million. The company also offers a V4‑Flash tier priced at $0.14 per million input tokens, $0.0028 per cache hit and $0.28 per million output tokens. Both V4 Pro and V4‑Flash advertise a one‑million‑token context window, support up to 384,000 output tokens, and accept OpenAI and Anthropic API formats to ease developer integration.

Those headline prices place Deepseek well below current Western frontier fees. GPT‑5.5 is listed at $5 per million input and $30 per million output (rising to $10/$45 for long‑context runs over roughly 272,000 tokens), while Anthropic’s Opus 4.7 is shown at $5 input and $25 output per million. By those comparisons V4 Pro is roughly 11.5× cheaper on standard input and about 34.5× cheaper on output versus GPT‑5.5; versus GPT‑5.5 long‑context pricing the gaps widen to about 23× on input and roughly 51.7× on output. V4‑Flash is cheaper still.

Price per token, however, is only one part of the economics. The report cites examples — Google’s Gemini Flash 3.5 and Anthropic’s Opus 4.7 — that burn more tokens than their predecessors, while GPT‑5.5 reportedly uses fewer tokens than GPT‑5.4. Independent benchmarks and real‑world usage will determine net costs, and coverage notes that Deepseek V4 still trails top frontier models on raw performance; teams must weigh task‑dependent tradeoffs among latency, accuracy and token efficiency.

Practically, the permanent cut matters most for agentic systems and high‑throughput applications that generate substantial output: Deepseek’s pricing can materially lower operating costs and may push price‑sensitive teams to prioritize “good‑enough” models over higher‑performing but costlier alternatives. The company is entering its first funding round and, according to reporting, faces different revenue pressure than firms moving toward IPOs-an economic backdrop that could sustain aggressive pricing competition while adoption and ROI remain hard to measure.

Sources

The Decoder AI · 5/23/2026

Replies (0)

No replies in this topic yet.

Back