Google’s AI Infrastructure Layer Map

Damnang · Damnang’s Substack · April 28, 2026 at 16:40 · ⏱ 14 min read | Read on Substack ↗

Summary

The article argues that AI infrastructure's center of gravity is shifting from training to inference, driving changes in chip design (Google's TPU split into 8t/8i) and a stack-level competition to reduce token cost. For markets, this means hyperscaler differentiation will increasingly depend on integrated hardware-software stacks, benefiting suppliers of inference-optimized chips and memory assistants, while NVIDIA remains a key player in hybrid cloud deployments.

•Google split its 8th-gen TPU into training (TPU 8t) and inference (TPU 8i) variants, signaling the workload shift from training to inference.
•TPU 8t superpod bundles 9,600 chips delivering 121 FP4 exaFLOPS and a 2PB shared HBM pool, with 97% goodupt as its most important metric for reliability.
•TPU 8i focuses on low-latency inference using Boardfly topology and CAE, with 288GB HBM and 384MB on-chip SRAM per chip.
•MediaTek is reported to be involved in the design of the inference TPU, reflecting a move to low-power, cost-optimized silicon partners.
•Marvell is in discussions with Google for assistant chips that sit alongside the TPU to reduce memory bottlenecks.
•Google also announced A5X bare metal instances based on NVIDIA Vera Rubin, indicating a multi-engine strategy rather than a full replacement of NVIDIA with TPU.

Read time 14 min

Length 14,547 chars

Category finance

Ideas

Damnang Substack author, Damnang’s Substack

Google's A5X instance based on NVIDIA Vera Rubin shows Google is not replacing NVIDIA but running both TPU and NVIDIA GPUs in its cloud, confirming continued demand for NVIDIA's inference-optimized GPUs. Risk: Google's own TPU 8i may capture a growing share of inference workloads, potentially limiting NVIDIA's growth in Google Cloud.

Damnang Substack author, Damnang’s Substack

MediaTek is reported to be involved in the inference TPU design, bringing low-power design and TSMC process integration expertise — a new design win that could expand MediaTek's role in AI accelerators. Risk: MediaTek's involvement is unconfirmed and may be limited to a single generation; competition from other ASIC partners could limit revenue impact.

Damnang Substack author, Damnang’s Substack

Marvell is in discussions with Google for assistant chips that reduce memory bottlenecks, addressing a key bottleneck in AI workloads. This could represent a new product category and revenue stream for Marvell. Risk: Discussions are preliminary; no official confirmation or volume commitments, and Marvell faces competition from other custom silicon vendors.

More from Damnang’s Substack

The CPU Bottleneck Trade: Who Actually Gets Paid in the Agentic AI Era?

Apr 26, 01:28

The Real Bottleneck in the Optical Interconnect Cycle Is InP

Apr 22, 07:35

Damnang's Optical Investment Map v1.0

Apr 19, 23:44

Can AMD Beat NVIDIA?

Apr 17, 09:15

The Cadence Ambition the Market Isn't Pricing Yet

Apr 16, 16:42

This newsletter, published April 28, 2026, features Damnang discussing NVDA, 2454.TW, MRVL. 3 trade ideas extracted by AI with direction and confidence scoring.

Speakers: Damnang · Tickers: NVDA, 2454.TW, MRVL