Meta is transitioning MTIA to a modular architecture to scale AI chips

Timeline

To ensure the cost-effective operation of generative neural networks globally, Meta has transitioned to an iterative modular hardware development architecture.

Elena Vorontsova

4/25/2026, 6:40:29 AM

Meta is transitioning MTIA to a modular architecture to scale AI chips

Every day, billions of users on Meta platforms interact with AI technology, receiving personalized recommendations and using intelligent assistants. Servicing a wide range of models on a global scale while maintaining the lowest possible costs is one of the most complex infrastructure challenges in the modern technology industry. In response to this challenge, the company is defining a new path of development, creating flexible hardware solutions that continuously improve as needs change. While remaining committed to using a diverse portfolio of microprocessors and the best available market solutions, both in-house and third-party, Meta places a special emphasis on its own line of accelerators.

The main systemic problem of the modern microelectronics market is that artificial intelligence models evolve significantly faster than traditional hardware development cycles can anticipate. Typically, chip design is based on forecasted workloads, but by the time the equipment is launched into mass production, which often takes about two years, these workloads can change significantly. To avoid relying on long-term bets and waiting for extended periods, Meta engineers deliberately chose an iterative development approach.

The historical foundation for this acceleration was laid by the first two generations of chips, details of which were published in research papers at the ISCA’23 and ISCA’25 conferences. These solutions, initially known as MTIA 1 and MTIA 2i, and now classified as MTIA 100 and MTIA 200, were deployed in production environments in hundreds of thousands of units. Many internal production models were launched on this base, and the chips themselves underwent successful testing using large language models such as Llama. After implementing the first versions, the company accelerated the development of four subsequent generations: MTIA 300, 400, 450, and 500.

The MTIA 300 architecture became the cost-effective foundation for this expansion, initially optimized for ranking and recommendation models that dominated Meta's systems before the widespread adoption of generative AI. Currently, this chip is in production for training recommendation systems. Distinctive features of this version include integrated network chiplets, dedicated messaging mechanisms to offload communication processes, and near-memory computation. The hardware configuration includes one compute chiplet, two network chiplets, and several stacks of high-bandwidth memory.

As demand for generative artificial intelligence rapidly grew, the base architecture transformed into the MTIA 400 model. This solution was adapted to better support generative networks, while fully retaining the ability to efficiently work with traditional ranking workloads. Equipped with a scaling domain of seventy-two accelerators, the chip delivers high performance that can successfully compete with leading commercial products on the market. At present, engineers have already completed laboratory testing of this version and are on track for its physical deployment in global data centers.

Anticipating a colossal increase in the need for generative neural network inference, the development team transitioned to the MTIA 450 version, implementing highly specific optimizations. Since memory bandwidth is the most critical factor affecting the speed of such models, engineers doubled this metric compared to the 400 series, making it significantly higher than existing advanced commercial counterparts. In addition to this, low-precision data types, specifically designed for inference workloads, were integrated into the architecture. Mass deployment of these accelerators is officially scheduled for early 2027.

Continuing to focus on optimizing generative AI inference processes, Meta announced the release of the final version from its current plan — the MTIA 500 chip. In this configuration, high-bandwidth memory throughput was increased by another fifty percent compared to the previous model, and additional innovations in data processing were introduced. Mass deployment of this chip is also slated for 2027. Evaluating the evolution from the 300 series to the 500 series, one can observe an unprecedented leap: memory bandwidth increases four and a half times, and floating-point computational power increases twenty-five times, transitioning from MX8 format to MX4 format.

Sources

Meta AI Blog

Replies (0)

No replies in this topic yet.

Back