Summary
The video dissects the real total cost of renting GPU clusters for AI training and inference. It argues that the commonly compared metric of dollar-per-GPU-hour is misleading because hidden costs like storage, networking, support, setup, debugging, and especially goodput (useful work completed) dramatically affect true economics. The presenter introduces ClusterMAX, a SemiAnalysis GPU cloud ranking that incorporates these factors, and advises customers to ask detailed operational questions before committing to a provider.
- Price per GPU-hour is only one of eight cost areas; storage, networking, control plane, support, goodput, setup, and debugging add hidden expenses.
- Goodput measures actual productive work after failures, restarts, and tuning; low goodput effectively wastes money.
- Large clusters face constant failure probability; recovery speed and hot spares heavily influence real costs.
- Three failure recovery approaches: cold spare restart (slowest), hot spare restart (faster), and fault-tolerant training with tradeoffs.
- ClusterMAX is introduced as an independent tier ranking that evaluates GPU clouds on total cost holistically, not just price.
- Potential renters should probe providers on failure rates, recovery time, hot spares, health checks, storage throughput, and expected goodput.
- The ultimate goal is buying time-to-research and model progress, not just GPU hours; the cheapest GPU is the one that finishes work fastest.