Repository navigation

#

vit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
2921
6 天前

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda
2238
15 天前

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Jupyter Notebook
1915
2 年前

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Jupyter Notebook
1191
2 年前

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda
684
7 天前

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

Python
609
7 个月前

A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer"

Python
540
4 年前

[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).

Jupyter Notebook
316
21 天前

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

Python
309
2 年前

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Python
297
4 年前

FFCS course registration made hassle free for VITians. Search courses and visualize the timetable on the go!

JavaScript
295
2 个月前

PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法

Python
284
2 年前