Buzzberg Cup Bracket locked

The true cost of a GPU cluster

Watch on YouTube ↗  |  June 26, 2026 at 14:56  |  13:41  |  SemiAnalysis
Speakers
Dylan Patel — Founder, CEO, and Chief Analyst at SemiAnalysis

Summary

The video dissects the real total cost of renting GPU clusters for AI training and inference. It argues that the commonly compared metric of dollar-per-GPU-hour is misleading because hidden costs like storage, networking, support, setup, debugging, and especially goodput (useful work completed) dramatically affect true economics. The presenter introduces ClusterMAX, a SemiAnalysis GPU cloud ranking that incorporates these factors, and advises customers to ask detailed operational questions before committing to a provider.

  • Price per GPU-hour is only one of eight cost areas; storage, networking, control plane, support, goodput, setup, and debugging add hidden expenses.
  • Goodput measures actual productive work after failures, restarts, and tuning; low goodput effectively wastes money.
  • Large clusters face constant failure probability; recovery speed and hot spares heavily influence real costs.
  • Three failure recovery approaches: cold spare restart (slowest), hot spare restart (faster), and fault-tolerant training with tradeoffs.
  • ClusterMAX is introduced as an independent tier ranking that evaluates GPU clouds on total cost holistically, not just price.
  • Potential renters should probe providers on failure rates, recovery time, hot spares, health checks, storage throughput, and expected goodput.
  • The ultimate goal is buying time-to-research and model progress, not just GPU hours; the cheapest GPU is the one that finishes work fastest.
Up Next