Repository navigation

vit

Website
Wikipedia

pix2tex: Using a ViT to convert images of equations into LaTeX code.

机器学习 transformer im2latex 深度学习 image2text LaTeX dataset PyTorch im2markup OCR latex-ocr vit math-ocr vision-transformer 图像处理 Python im2text

Python

14154

1123

3 个月前

cmhungsteve / Awesome-Transformer-Attention

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

transformer attention-mechanism vision-transformer 深度学习 Awesome Lists transformer-cv transformer-architecture transformer-awesome transformer-with-cv transformer-models visual-transformer 机器视觉 papers attention-mechanisms self-attention vit detr transformers

4836

492

9 个月前

towhee-io / towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

机器学习 convolutional-networks embedding-vectors embeddings 机器视觉图像处理 video-processing feature-extraction image-retrieval unstructured-data feature-vector transformer milvus vision-transformer vit pipeline 大语言模型

Python

3353

258

6 个月前

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4v large-language-models llava multi-modal openai vqa 大语言模型 openai-api qwen gpt 机器视觉 PyTorch gpt4 ChatGPT clip vit evaluation claude gemini

Python

2233

333

5 小时前

hila-chefer / Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

深度学习 vision-transformer bert-model bert explainability vit cvpr2021

Jupyter Notebook

1875

248

1 年前

roboflow / inference

Turn any computer or edge device into a command center for your computer vision projects.

机器视觉 inference-api inference-server vit yolov5 yolov8 jetson tensorrt classification instance-segmentation object-detection onnx 部署 Docker inference 机器学习 Python yolo11 agents

Python

1635

171

19 小时前

Yangzhangcst / Transformer-in-Computer-Vision

A paper list of some recent Transformer-based CV works.

transformer transformer-cv transformer-awesome detr vit Awesome Lists 机器视觉深度学习 papers

1245

142

4 天前

BR-IDL / PaddleViT

🤖 PaddleViT: State-of-the-art Visual Transformer and MLP Models for PaddlePaddle 2.0+

cv 机器视觉 paddlepaddle vit mlp transformer encoder-decoder classification detection segmentation Generative Adversarial Network 深度学习 semantic-segmentation object-detection

Python

1231

325

3 年前

yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

vision-transformer vit

Jupyter Notebook

1185

176

1 年前

sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

bert-model convnext 深度学习 fairseq optimizer resnet timm vit transformer-xl 人工智能 diffusion dreamfusion gpt2 PyTorch cuda-programming llm-training llms moe

Python

787

10 个月前

v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

PyTorch feature-extraction parallel audio-features i3d resnet raft optical-flow clip timm vit

Python

591

3 个月前

chinhsuanwu / mobilevit-pytorch

A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer"

vit mobilenetv2 vision-transformer

Python

528

3 年前

zgcr / SimpleAICV_pytorch_training_examples

SimpleAICV:pytorch training and testing examples.

PyTorch resnet vit van detr fcos retinanet deeplabv3plus solov2 yolact dbnet sam segment-anything

Jupyter Notebook

432

1 个月前

gupta-abhay / pytorch-vit

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

image-recognition transformers image-classification vit vision-transformer

Python

293

4 年前

eeyhsong / EEG-Transformer

i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.

深度学习 attention-mechanism vit transformer attention EEG eeg-classification

Python

291

2 年前

vatz88 / FFCSonTheGo

FFCS course registration made hassle free for VITians. Search courses and visualize the timetable on the go!

vit vellore ffcs timetable Hacktoberfest JavaScript

JavaScript

290

5 个月前

PaddlePaddle / PASSL

PASSL包含 SimCLR，MoCo v1/v2，BYOL，CLIP，PixPro，simsiam, SwAV, BEiT，MAE 等图像自监督算法以及 Vision Transformer，DEiT，Swin Transformer，CvT，T2T-ViT，MLP-Mixer，XCiT，ConvNeXt，PVTv2 等基础视觉算法

深度学习 moco simclr clip self-supervised-learning paddle swin-transformer vision-transformer beit convnext vit deit pvt swav

Python

282

2 年前