Repository navigation

#

multi-modal-learning

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python
1562
4 个月前

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python
1310
1 年前

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Python
707
2 年前

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

662
2 年前

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Python
582
2 个月前

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

Python
449
9 个月前

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

Python
288
1 个月前

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

Python
243
1 年前

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

Python
183
5 年前

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

Python
157
1 年前

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Python
143
9 个月前

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Python
129
8 个月前

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

Python
117
1 年前

【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Python
100
6 个月前

A python tool to perform deep learning experiments on multimodal remote sensing data.

Python
88
3 年前