August update of Cohere models: architectural leap for the Command series

Background

Cohere has released a major update for its flagship Command R and Command R+ models, significantly increasing their throughput, reducing hardware requirements, and introducing customizable contextual safety modes for

Alina Karpova

4/25/2026, 4:01:01 PM

August update of Cohere models: architectural leap for the Command series

Canadian AI research lab Cohere unveiled a major August update to its flagship line of generative language models, officially releasing enhanced versions of the Command R and Command R+ neural networks. These new iterations, available via the application programming interface under the technical designations command — r-08-2024 and command — r-plus-08-2024 respectively, demonstrate a significant improvement in performance across a wide range of computational tasks. This architectural leap comes at a critical juncture in the industry's development, as the corporate sector begins to fundamentally rethink its long-term AI adoption strategies.

The most obvious practical consequence of this update is the impressive changes in throughput metrics and hardware requirements. The updated command — r-08-2024 model provides approximately 50% higher throughput and a 20% reduction in latency when processing requests compared to the previous Command R version. Even more importantly for enterprise clients, this speed increase is accompanied by a twofold reduction in hardware resources required to serve the model. At the same time, the larger command — r-plus-08-2024 neural network also demonstrates an approximately 50% increase in throughput and a 25% reduction in latency, while maintaining previous hardware requirements.

The August release holds particular practical significance for corporate systems actively using Retrieval Augmented Generation (RAG) technology. Both models have received significant improvements in multilingual RAG search, allowing users to make queries in their native language and generate more accurate responses. Citation quality has been significantly improved; however, for developers whose workflows do not require strict referencing, a convenient new feature to completely disable citations has been introduced. Additionally, the models have become better at analyzing structured data and generating it based on unstructured natural language instructions.

A significant shift has also occurred in how language models interact with complex, multi-step agentic processes and external tools. Versions command — r-08-2024 and command — r-plus-08-2024 implement an improved algorithm for deciding which tool to use in a specific context, as well as in situations where it's better to avoid using tools altogether. The models demonstrate improved adherence to instructions provided in the query preamble and possess increased robustness to non-semantic changes in text prompts, such as adding extra spaces or new lines.

To further expand the variety of supported enterprise application scenarios, Cohere's update introduces a detailed, deeply context-dependent approach to content moderation. Historically, safety guardrails in generative AI have been predominantly reactive and binary, often creating significant difficulties for users trying to define safe usage boundaries for their unique business needs. Understanding that concepts of safety and appropriateness are largely context-dependent, and that predictability and control are key factors in building trust in Cohere's technologies, engineers have introduced a new beta feature called Safety Modes.

The new Safety Modes offer developers three different operational options to precisely match the operational requirements of a specific project. The STRICT mode, intended for general and corporate use, provides an additional layer of safety by strictly prohibiting any inappropriate responses or recommendations and encouraging complete avoidance of all potentially sensitive topics. The CONTEXTUAL mode, which is now enabled by default, is designed for a wider range of interactions for entertainment, creative, or educational purposes.

The final element of the August update was the publication of an updated pricing structure, which directly reflects the significant differences in computational power and intended purpose of each presented model. Using the flagship command — r-plus-08-2024 model will cost developers $2.50 per million input tokens, while generating a million output tokens will cost $10.00. In turn, the lighter and hardware-optimized command — r-08-2024 model is offered at a significantly more affordable price: just $0.15 per million input tokens and $0.60 per million output tokens.

Sources

Cohere Changelog · 4/4/2026

Replies (0)

No replies in this topic yet.

Back