AIbase
プロダクトライブラリツールナビゲーション

SageAttention

Public

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

作成時間2024-10-03T17:33:18
更新時間2025-03-27T08:33:32
1.4K
Stars
9
Stars Increase