Google Cloud introduces TPU 8t and TPU 8i — eighth-generation systems for large-scale AI

News

4/23/2026, 8:41:56 AM

Google Cloud introduces TPU 8t and TPU 8i — eighth-generation systems for large-scale AI

Eighth-generation TPUs are designed with system co-design to accelerate the full AI lifecycle. The TPU 8t model is intended for training advanced models, while the TPU 8i is for large-scale inference and reinforcement learning.

On April 22, 2026, Google Cloud announced the eighth generation of its Tensor Processing Units (TPUs), including the TPU 8t and TPU 8i models. These new accelerators, introduced by Divakar Gupta and Sebastian Mugazambi, are designed with system co-design to optimize the full artificial intelligence lifecycle, addressing growing demands for performance and efficiency in the AI domain.

The TPU 8t model is specifically optimized for large-scale pre-training and handling a large volume of embeddings, while the TPU 8i is designed for large-scale inference and reinforcement learning tasks. Both systems are key components of Google Cloud's AI Hypercomputer architecture, which integrates hardware, software, and networking technologies. An important feature also includes the incorporation of Arm Axion processors, which eliminates host bottlenecks arising from data preparation latencies.

TPU 8t includes a number of significant technical enhancements. Among them is SparseCore, a specialized accelerator for efficiently processing irregular memory access patterns during embedding lookups. The architecture provides VPU/MXU overlap for maximum utilization of floating-point operations (FLOPs) and minimization of vector operation time. Native FP4 support has been implemented, which doubles MXU throughput while maintaining accuracy for large models and reducing power consumption by decreasing the amount of data moved. A single TPU 8t supersystem scales up to 9600 chips using an innovative 3D torus network topology.

The emergence of models such as massive Mixture-of-Experts (MoE) architectures and reasoning-focused models necessitates hardware evolution. Modern accelerators must not merely increase floating-point operations but also address the specific operational intensities of the latest workloads. The growth of agentic AI, capable of processing long context windows and complex sequential logic, only underscores this need for specialized solutions.

Google Cloud's new TPUs are designed to address challenges related to the development of "world models" that simulate future scenarios and learn through "imagination." They enable efficient training and serving of advanced systems like Google DeepMind's Genie 3, empowering millions of agents to hone their reasoning skills in diverse simulated environments. The principles of scalability, reliability, and efficiency remain central to Google TPU's design philosophy.

Sources

Google Cloud Blog — AI & Machine Learning · 4/22/2026

Replies (0)

No replies in this topic yet.

Back