Ironwood TPU: Revolutionizing AI Inference Power

Google’s Ironwood: Twice as Powerful Per Watt as TPU v6

Despite the complexities of benchmarking, the underlying message is clear: Ironwood marks an important development within Google’s artificial intelligence infrastructure. The advanced speed and improved efficiency of Ironwood extend the durable base, which enabled swift advancements in models like Gemini 2.5 that run on older TPU technology.

Google believes Ironwood’s enhanced inference capabilities and improved efficiency will lead to transformative AI developments over the next year. Ironwood will enable Google’s “age of inference” vision by supplying computational power necessary to operate complex models and achieve true agentic capabilities that make AI more proactive and intelligent in digital life and allow it to think on our behalf.

Decoding the Numbers: Ironwood’s Performance Context

Assessing the performance of various AI chips proves difficult because different systems use distinct benchmarking methods. Google has designated FP8 precision as the main benchmark standard for assessing Ironwood. The company asserts Ironwood pods outperform comparable supercomputer segments by 24 times, but caution is necessary because certain high-performance computing systems lack native FP8 hardware support.

Google’s TPU v6, also known as Trillium, was omitted from their direct performance comparisons. According to Google, Ironwood demonstrates double the performance per watt efficiency when compared to the v6 model. Google has announced Ironwood as the next-generation model following TPU v5p, while Trillium serves as the successor to the weaker TPU v5e. Trillium achieved a peak performance rating of approximately 918 TFLOPS when operating at FP8 precision.

Inside Ironwood: A Performance Powerhouse

Ironwood provides substantial improvements in processing power over the earlier Google TPUs. The deployment strategy requires building enormous, liquid-cooled systems that will contain up to 9,216 separate Ironwood chips. The new advanced Inter-Chip Interconnect (ICI) enables seamless communication between these large computational resources to maintain high-speed and efficient data flow throughout the system.

Google Cloud developers and Google’s AI research teams will be able to access this extraordinary processing capacity. Ironwood will be offered in two configurations: Ironwood servers will come in two configurations: a 256-chip system for less intense AI workloads and a 9,216-chip cluster designed for the most demanding AI processing requirements.

The fully configured Ironwood pod delivers a staggering 42.5 Exaflops of inference computing power. Google data shows the performance of each Ironwood chip reaches 4,614 TFLOPs which represents major progress compared to earlier TPU generations. The memory capacity of Ironwood chips has expanded to 192GB which represents a sixfold increase over the memory found in Trillium TPU. The memory bandwidth now reaches 7.2 Tbps after a 4.5x growth.

Google has just unveiled its latest innovation in custom silicon: Ironwood represents Google’s seventh generation Tensor Processing Unit (TPU) architecture. The latest chip design targets the complex needs of Google’s Gemini models to enable their “thinking” capability defined as simulated reasoning by Google.

The company repeatedly emphasizes the essential integration between its sophisticated AI models and its tailored infrastructure. Ironwood stands as an essential part of this strategic plan because it promises major improvements in inference speeds and enables handling of larger contextual information volumes in advanced models. Google presents Ironwood as its most scalable and powerful TPU yet, which enables AI systems to help users by independently collecting data and producing results that align with Google’s “agentic AI” concept and their “age of inference” future vision.