IBM Launches Granite 4.1 LLMs: A Data-Centric Approach to Open AI Development

News

4/30/2026, 5:21:18 AM

IBM Launches Granite 4.1 LLMs: A Data-Centric Approach to Open AI Development

IBM has unveiled its latest suite of large language models, the Granite 4.1 family, emphasizing efficiency and performance through meticulous data engineering. These models utilize dense, decoder — only transformer architectures and are available in three distinct parameter variants: 3 billion, 8 billion, and 30 billion. This architectural choice aims for streamlined processing while maintaining robust capabilities, a core aspect of their design.

The core design choices for the Granite 4.1 models incorporate several advanced architectural components. These include Grouped Query Attention (GQA) for improved efficiency, Rotary Position Embeddings (RoPE) for sequence length handling, SwiGLU activations, RMSNorm for stable training, and shared input/output embeddings to optimize parameter usage. Crucially, all three model sizes share the same foundational training pipeline and data strategy, differing primarily in architectural dimensions such as embedding size, number of layers, and attention head configurations.

The development of Granite 4.1 involved a precisely engineered, multi — stage pre-training pipeline, training models from scratch on approximately 15 trillion tokens. This methodology comprised five distinct phases, strategically designed to progressively enhance data quality over sheer volume. Initial phases established broad language understanding, followed by mid-training stages employing progressively higher — quality data annealing. The pipeline culminated in a critical phase dedicated to long-context extension, expanding the models' context window significantly to 512K tokens.

Each pre-training phase employed a distinct data mixture and learning — rate schedule, gradually shifting from broad web-scale data to more curated, domain — specific content. For instance, initial phases established general language understanding with a high proportion of CommonCrawl data, while subsequent phases sharply increased the proportion of code and mathematical data, leading to a fivefold increase in math data and a 1.5x increase in code by phase two. Later stages introduced high-quality synthetic data, chain — of-thought examples, and various instruction — tuning data, reflecting a progressive refinement from general knowledge acquisition to specialized reasoning.

The final pre-training phase, dedicated to Long Context Extension (LCE), systematically expanded the context window from 4K up to 512K tokens through staged increments, utilizing a data mix enriched with books and code repositories. Beyond pre-training, the Granite 4.1 models underwent further refinement through supervised fine-tuning on approximately 4.1 million high-quality curated samples. This was complemented by reinforcement learning using an on-policy GRPO with DAPO loss, a multi — stage pipeline specifically applied to strengthen performance in critical areas such as mathematical reasoning, coding proficiency, instruction following, and general conversational abilities.

In a field where model development often prioritizes scaling parameters, Granite 4.1 distinguishes itself by demonstrating that rigorous data curation and efficient architectural design can yield highly competitive performance. Notably, the 8 billion parameter instruct model within the Granite 4.1 family has been observed to match or surpass the performance of the previous Granite 4.0 — H-Small, a significantly larger 32 billion parameter Mixture — of-Experts (MoE) model. This achievement underscores the effectiveness of IBM's data-centric approach in developing smaller, yet highly capable models, which are inherently more accessible and easier to deploy in diverse applications.

The open-source release of the entire Granite 4.1 model family under the permissive Apache 2.0 license signifies IBM's commitment to fostering advancements within the broader artificial intelligence community. This initiative empowers developers and researchers across various industries to freely utilize and build upon IBM's innovations without the encumbrance of proprietary licensing fees, thereby promoting collaborative growth and accelerating the pace of development in the rapidly evolving AI landscape.

Sources

Hugging Face Blog · 4/29/2026

Replies (0)

No replies in this topic yet.

Back