How Makora Generates CUDA Kernels That Beat Hand-Tuned Code | Researcher Conversations at GTC

Watch on YouTube ↗  |  May 27, 2026 at 22:15  |  26:36  |  SemiAnalysis

Summary

Mohamed Abdelfattah discusses Makora's automated GPU kernel generation and novel sequential Monte Carlo speculative decoding techniques. The conversation covers performance optimization, reward hacking mitigation, and hardware-specific advantages for AMD vs Nvidia.

  • Makora automates high-performance GPU kernel generation and system-level AI inference optimization.
  • Sequential Monte Carlo speculative decoding achieves 5x speedup over SGLang baseline by maintaining multiple parallel drafts.
  • Makora differentiates by selling end-to-end performance rather than just a code generation compiler.
  • Research on FP4 quantization with redundant zero remapping offers accuracy of FP5 at FP4 memory footprint.
  • AMD hardware offers advantages for certain optimizations due to shared FP6/FP4 data path.
  • Makora's eval pipeline detects reward hacking and is sold as a service to other companies.
  • Future plans include expanding to training and reinforcement learning with a user-friendly deployment engine.
  • Open-source releases are planned for research components like SMC speculative decoding.
Up Next