Nvidia – The Inference Kingdom Expands

Dylan Patel · SemiAnalysis · March 24, 2026 at 00:27 · ⏱ 34 min read  | Read on Substack ↗
Summary
Nvidia's GTC 2026 reveals a multi-pronged strategy to extend its inference dominance: integrating Groq's LPU for low-latency decode via attention-FFN disaggregation, scaling rack systems from copper to CPO (Rubin Ultra NVL576, Feynman NVL1152), and pushing into storage with CMX/STX. The article argues Nvidia's co-packaged optical ramp and dense CPU racks (Vera ETL256) will pressure incumbent networking and memory suppliers while creating new foundry demand for TSMC's advanced packaging.
  • Nvidia structured its $20B Groq 'acquisition' as an IP license/hire to avoid antitrust; Groq's LPU 3 (LP30) will be fabricated on Samsung SF4, while the next-gen LP40 moves to TSMC N3P with CoWoS-R and hybrid bonded DRAM from SK Hynix.
  • Attention-FFN disaggregation (AFD) maps attention to GPUs (stateful, high HBM) and FFN to LPUs (stateless, high SRAM bandwidth) to improve decode latency and throughput for sparse MoE models.
  • LPX rack has 32 compute trays, each with 16 LPUs, 2 Altera FPGAs (acting as NIC, PCIe bridge, and extra DDR5 pool), and a Granite Rapids host CPU; intra-node all-to-all C2C uses 4x100G per LPU via PCB traces.
  • CPO debut is deferred to Rubin Ultra NVL576 (8 Oberon racks, two-tier all-to-all with CPO between racks) and Feynman NVL1152 (8 Kyber racks, CPO for inter-rack but copper within rack for now); Jensen hinted Feynman could be 'all CPO'.
  • Vera ETL256 packs 256 CPUs per rack using 32 compute trays, Spectrum-X 4-switch multiplane topology, all-copper intra-rack, and liquid cooling; rationale is cost savings from eliminating optics on the spine.
  • STX reference storage rack uses BF-4 DPUs with Vera CPU, dual CX-9 NICs, and SOCAMMs; Nvidia named 15 storage vendors as STX supporters, signaling push into infrastructure software layer.
  • Nvidia's Kyber rack for Rubin Ultra NVL144 has 36 compute blades (4 GPUs + 2 Vera each), 72 NVLink 7 switches, and uses copper flyover cables to midplane; no CPO within rack.
  • NVLink 7 switch expected to double bandwidth and radix vs NVLink 6 to enable all-to-all NVL288, but current supply chain evidence is mixed; NVL288 may require oversubscription without higher-radix switch.
Read time 34 min
Length 34,431 chars
Category finance
Trade Ideas
Dylan Patel Founder, CEO, and Chief Analyst at SemiAnalysis
Nvidia's LP40 moves to TSMC N3P with CoWoS-R, and the article states 'TSMC’s N3 ... is putting a cap on accelerator production' — TSMC's advanced node and packaging capacity is a bottleneck, giving it
Nvidia's LP40 moves to TSMC N3P with CoWoS-R, and the article states 'TSMC’s N3 ... is putting a cap on accelerator production' — TSMC's advanced node and packaging capacity is a bottleneck, giving it pricing power and high utilization. Risk: Geopolitical risk on Taiwan; potential shift to Samsung for some nodes (LP30 on SF4) could dilute TSMC's share.
Dylan Patel Founder, CEO, and Chief Analyst at SemiAnalysis
Altera FPGAs (now part of Intel) serve as 'Fabric Expansion Logic' in every LPX compute tray, handling NIC conversion, PCIe bridging, and extra DDR5 memory — this is a new, high-volume design win for
Altera FPGAs (now part of Intel) serve as 'Fabric Expansion Logic' in every LPX compute tray, handling NIC conversion, PCIe bridging, and extra DDR5 memory — this is a new, high-volume design win for Intel's programmable logic business. Risk: Intel's Altera unit faces competition from Xilinx/AMD; Nvidia could in-source FPGA functionality in future generations.
More from SemiAnalysis

This newsletter, published March 24, 2026, features Dylan Patel discussing TSM, INTC. 2 trade ideas extracted by AI with direction and confidence scoring.

Speakers: Dylan Patel  · Tickers: TSM, INTC