Google’s AI Infrastructure Layer Map

Damnang · Damnang’s Substack · April 28, 2026 at 16:40 · ⏱ 14 min read  | Read on Substack ↗
Summary
The article argues that AI infrastructure's center of gravity is shifting from training to inference, driving changes in chip design (Google's TPU split into 8t/8i) and a stack-level competition to reduce token cost. For markets, this means hyperscaler differentiation will increasingly depend on integrated hardware-software stacks, benefiting suppliers of inference-optimized chips and memory assistants, while NVIDIA remains a key player in hybrid cloud deployments.
  • Google split its 8th-gen TPU into training (TPU 8t) and inference (TPU 8i) variants, signaling the workload shift from training to inference.
  • TPU 8t superpod bundles 9,600 chips delivering 121 FP4 exaFLOPS and a 2PB shared HBM pool, with 97% goodupt as its most important metric for reliability.
  • TPU 8i focuses on low-latency inference using Boardfly topology and CAE, with 288GB HBM and 384MB on-chip SRAM per chip.
  • MediaTek is reported to be involved in the design of the inference TPU, reflecting a move to low-power, cost-optimized silicon partners.
  • Marvell is in discussions with Google for assistant chips that sit alongside the TPU to reduce memory bottlenecks.
  • Google also announced A5X bare metal instances based on NVIDIA Vera Rubin, indicating a multi-engine strategy rather than a full replacement of NVIDIA with TPU.
Read time 14 min
Length 14,547 chars
Category finance
Trade Ideas
Damnang Substack author, Damnang’s Substack
Google's A5X instance based on NVIDIA Vera Rubin shows Google is not replacing NVIDIA but running both TPU and NVIDIA GPUs in its cloud, confirming continued demand for NVIDIA's inference-optimized GP
Google's A5X instance based on NVIDIA Vera Rubin shows Google is not replacing NVIDIA but running both TPU and NVIDIA GPUs in its cloud, confirming continued demand for NVIDIA's inference-optimized GPUs. Risk: Google's own TPU 8i may capture a growing share of inference workloads, potentially limiting NVIDIA's growth in Google Cloud.
Damnang Substack author, Damnang’s Substack
Marvell is in discussions with Google for assistant chips that reduce memory bottlenecks, addressing a key bottleneck in AI workloads. This could represent a new product category and revenue stream fo
Marvell is in discussions with Google for assistant chips that reduce memory bottlenecks, addressing a key bottleneck in AI workloads. This could represent a new product category and revenue stream for Marvell. Risk: Discussions are preliminary; no official confirmation or volume commitments, and Marvell faces competition from other custom silicon vendors.
Damnang Substack author, Damnang’s Substack
MediaTek is reported to be involved in the inference TPU design, bringing low-power design and TSMC process integration expertise — a new design win that could expand MediaTek's role in AI accelerator
MediaTek is reported to be involved in the inference TPU design, bringing low-power design and TSMC process integration expertise — a new design win that could expand MediaTek's role in AI accelerators. Risk: MediaTek's involvement is unconfirmed and may be limited to a single generation; competition from other ASIC partners could limit revenue impact.
More from Damnang’s Substack

This newsletter, published April 28, 2026, features Damnang discussing NVDA, MRVL, 2454.TW. 3 trade ideas extracted by AI with direction and confidence scoring.

Speakers: Damnang  · Tickers: NVDA, MRVL, 2454.TW