Titan AI LogoTitan AI

SageAttention

2,424
226
Cuda

Project Description

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Project Information

Created on 10/3/2024
Updated on 9/26/2025