Repository navigation

#

multi-modal-learning

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python
1581
8 个月前

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python
1309
2 年前

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Python
712
2 年前

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

664
2 年前

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Python
618
2 个月前

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Python
451
1 年前

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

Python
322
21 天前

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

Python
251
1 年前

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Python
186
5 年前

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

Python
158
1 年前

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Python
146
1 年前

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Python
136
1 年前

[ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation

Python
135
2 个月前

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Python
130
1 年前

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Python
108
10 个月前