Name: How Makora Generates CUDA Kernels That Beat Hand-Tuned Code | Researcher Conversations at GTC
Uploaded: 2026-05-27T22:15:06+00:00
Duration: 1596 s

Summary

Mohamed Abdelfattah discusses Makora's automated GPU kernel generation and novel sequential Monte Carlo speculative decoding techniques. The conversation covers performance optimization, reward hacking mitigation, and hardware-specific advantages for AMD vs Nvidia.

Makora automates high-performance GPU kernel generation and system-level AI inference optimization.
Sequential Monte Carlo speculative decoding achieves 5x speedup over SGLang baseline by maintaining multiple parallel drafts.
Makora differentiates by selling end-to-end performance rather than just a code generation compiler.
Research on FP4 quantization with redundant zero remapping offers accuracy of FP5 at FP4 memory footprint.
AMD hardware offers advantages for certain optimizations due to shared FP6/FP4 data path.
Makora's eval pipeline detects reward hacking and is sold as a service to other companies.
Future plans include expanding to training and reinforcement learning with a user-friendly deployment engine.
Open-source releases are planned for research components like SMC speculative decoding.

How Makora Generates CUDA Kernels That Beat Hand-Tuned Code | Researcher Conversations at GTC

Summary

Up Next