At the Google Cloud Next '26 conference, held on April 22, 2026, Google Cloud announced a significant expansion of its artificial intelligence infrastructure portfolio. These innovations are primarily aimed at supporting the so-called "agent era" of AI, where systems don't just answer questions, but are capable of reasoning and performing actions through chains of specialized agents. A key announcement was the eighth generation of Tensor Processing Units (TPUs), which for the first time includes two specialized chips designed to optimize workloads and reduce power consumption costs. Specifically, the TPU 8t, designed for high-performance AI model training, offers almost three times the computational power compared to previous generations.
The flagship TPU 8t combines 9600 chips in a single superpod, delivering an impressive 121 exaflops of computing power and 2 petabytes of total memory. Meanwhile, the TPU 8i was specifically designed for inference and reinforcement learning, providing ultra-low latency, which is critical for agent workflows and Mixture of Experts (MoE) models. This is achieved through a tripled on-chip SRAM volume (384 MB) and increased HBM (288 GB). Google positions its unified AI Hypercomputer platform, which underpins Gemini and other AI services, as the optimal solution for scaling these complex requirements, emphasizing the inefficiency of traditional infrastructures for agentic intelligence.
The Google Cloud portfolio was expanded not only with new TPUs. It also includes A5X bare metal instances based on NVIDIA Vera Rubin NVL72, as well as Axion N4A virtual machines with custom Arm processors from Google. Additionally, fourth-generation Google Compute Engine virtual machines based on Intel and AMD x86 processors were introduced. These hardware innovations are complemented by the breakthrough Virgo Network fabric, designed for AI workloads, the high-performance Google Cloud Managed Lustre parallel file system, and Z4M VMs with local SSD storage and RDMA, ensuring a comprehensive improvement across the entire ecosystem.
For developers, Google Cloud implemented native PyTorch support for TPUs, and also introduced new Google Kubernetes Engine (GKE) capabilities for orchestrating agent-specific workloads. Collectively, these innovations are designed to significantly accelerate the development and training of complex models and agent workflows, reducing timelines from months to weeks.
Sources
Replies (0)
No replies in this topic yet.