Repository navigation
linear-attention
- Website
- Wikipedia
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
[NeurIPS 2024] Official code of ”LION: Linear Group RNN for 3D Object Detection in Point Clouds“
Explorations into the recently proposed Taylor Series Linear Attention
Implementation of Agent Attention in Pytorch
The semantic segmentation of remote sensing images
The semantic segmentation of remote sensing images
CUDA implementation of autoregressive linear attention, with all the latest research findings
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)
RWKV Wiki website (archived, please visit official wiki)
[ICML 2024] Official implementation of "LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions."
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
LEAP: Linear Explainable Attention in Parallel for causal language modeling with O(1) path length, and O(1) inference