Cerebras — Faster Tokens Please

Myron Xie · SemiAnalysis · May 13, 2026 at 18:18 · ⏱ 52 min read  | Read on Substack ↗
Summary
Cerebras's wafer-scale SRAM architecture delivers unmatched inference speed (thousands of tokens per second) by dedicating 50% of silicon to on-chip SRAM, making it ideal for fast-token premium tiers. The OpenAI 750MW deal validates this approach, but the architecture faces hard constraints: SRAM scaling is flat beyond 5nm, off-wafer I/O bandwidth is only 150 GB/s (130x less dense than Nvidia's), and limited 44GB SRAM forces pipeline parallelism for models over ~120B parameters. For markets, this means Cerebras is a niche but validated player in low-batch high-speed inference, while HBM-based GPUs (Nvidia, AMD) remain necessary for large-model throughput and long-context agentic workloads.
  • Cerebras WSE-3 has 44 GB of SRAM with 21 PB/s bandwidth, but SRAM capacity only grew 10% from WSE-2 to WSE-3 (40GB to 44GB) vs 50% logic transistor increase.
  • Off-wafer bandwidth is only 150 GB/s (1.2 Tbps) for the entire wafer, compared to 900 GB/s for a single Blackwell GPU via NVLink5; shoreline I/O density is 0.17 GB/s per mm of edge vs Nvidia's ~22 GB/s/mm.
  • The OpenAI deal includes a $1B working capital loan at 6% interest, a 750MW compute commitment (expandable to 2GW), and a warrant for 33.4M shares at $0.00001; the warrant is valued at $82.02 per share, implying potential $2.74B contra-revenue.
  • Cerebras only serves models up to 120B parameters on its public cloud (GPT-OSS), with 128K context window; real-world agentic traces show P50 ISL of 96.3K tokens and 50% of requests exceed 128K.
  • Each CS-3 system has a BOM of ~$450K (after memory price hikes), with TSMC wafer ~$20K but Vicor power delivery and custom cooling each nearly matching that cost.
  • Cerebras's optimal arithmetic intensity is 0.74 (FP16/int8), making it ideal for low-batch decode kernels; Nvidia's HBM-based GPUs have arithmetic intensities >1000, better suited for throughput with large batches.
  • The WSE-3 is fabricated on TSMC N5, not 3nm; the next-gen CS-4 will use the same N5 wafer with higher power/clocks but no SRAM increase due to SRAM scaling death beyond 5nm.
  • Cerebras plans to hybrid-bond a photonic transceiver wafer (with Ranovus) and a DRAM wafer to address I/O and memory limits, but timeline and thermo-mechanical challenges remain unresolved.
Read time 52 min
Length 52,765 chars
Category finance
Trade Ideas
Myron Xie Substack author, SemiAnalysis
Cerebras is ramping wafer orders at TSMC for the WSE-3 on N5; the article notes 'demand surge is already visible in TSMC’s wafer loadings, which step up materially each quarter through the year to mee
Cerebras is ramping wafer orders at TSMC for the WSE-3 on N5; the article notes 'demand surge is already visible in TSMC’s wafer loadings, which step up materially each quarter through the year to meet OpenAI’s deployment requirements.' Each wafer costs ~$20K but requires custom masks per batch, adding to TSMC revenue. Risk: Cerebras is a single customer; wafer volumes are small compared to Nvidia/AMD but growing.
Myron Xie Substack author, SemiAnalysis
Vicor supplies custom power delivery modules for each WSE-3 engine block, delivering 25kW via 84 Vicor power bricks. The article states 'VICR content in each WSE is not too far from TSMC’s content,' m
Vicor supplies custom power delivery modules for each WSE-3 engine block, delivering 25kW via 84 Vicor power bricks. The article states 'VICR content in each WSE is not too far from TSMC’s content,' meaning Vicor is a major BOM contributor. Cerebras's ramp directly benefits Vicor. Risk: Customer concentration; Vicor's revenue is heavily dependent on Cerebras shipments; any delays or cancellations in the OpenAI deal hurt Vicor.
Myron Xie Substack author, SemiAnalysis
Trane Technologies (TT) acquired LiquidStack, Cerebras's primary cooling partner. The article notes LiquidStack developed L2L single-phase CDUs sized to CS-3's high flow rate (~4 LPM/kW vs Nvidia's ~1
Trane Technologies (TT) acquired LiquidStack, Cerebras's primary cooling partner. The article notes LiquidStack developed L2L single-phase CDUs sized to CS-3's high flow rate (~4 LPM/kW vs Nvidia's ~1.5 LPM/kW). As Cerebras deploys thousands of CS-3 systems, cooling infrastructure demand grows for TT. Risk: Cerebras's next-gen CS-4 aims to lower flow rate to 1.5–1.7 LPM/kW, which could reduce the per-unit cooling revenue.
Myron Xie Substack author, SemiAnalysis
The article criticizes AMD's capital allocation: 'AMD did ~$221 million of buybacks last quarter yet internally multiple AMD internal teams continue to lack development interconnected GPU clusters.' T
The article criticizes AMD's capital allocation: 'AMD did ~$221 million of buybacks last quarter yet internally multiple AMD internal teams continue to lack development interconnected GPU clusters.' This suggests AMD is underinvesting in networking/scale-up fabric, which matters for inference and training clusters. Risk: AMD may still succeed via other routes; the critique is based on one anecdote about internal GPU cluster shortages.
Myron Xie Substack author, SemiAnalysis
The article extensively compares Cerebras against Nvidia GPUs and shows that for large models with long context (e.g., DeepSeek V4), Nvidia's HBM-based systems (GB300 NVL72) offer far more capacity (2
The article extensively compares Cerebras against Nvidia GPUs and shows that for large models with long context (e.g., DeepSeek V4), Nvidia's HBM-based systems (GB300 NVL72) offer far more capacity (20TB HBM per rack vs 44GB SRAM per wafer) and are necessary for throughput. Cerebras's limitations reinforce Nvidia's dominance in mainstream inference. Risk: Cerebras and Groq could carve out a high-speed premium tier that reduces Nvidia's pricing power in that segment.
More from SemiAnalysis

This newsletter, published May 13, 2026, features Myron Xie discussing TSM, VICR, TT, AMD, NVDA. 5 trade ideas extracted by AI with direction and confidence scoring.

Speakers: Myron Xie  · Tickers: TSM, VICR, TT, AMD, NVDA