NVIDIA has poor matrix-vector performance for inference.
NVIDIA's GPUs have a worsening ratio of matrix-matrix to matrix-vector performance from Hopper to Blackwell, making them inefficient for AI inference workloads that rely heavily on matrix-vector operations, which Positron's architecture solves.