Repository navigation

multi-modal-learning

Website
Wikipedia

mlfoundations / open_clip

An open source implementation of CLIP.

深度学习 PyTorch 机器视觉 language-model multi-modal-learning contrastive-loss zero-shot-classification pretrained-models

Python

12699

1172

13 天前

OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

chinese 机器视觉 multi-modal-learning 自然语言处理 PyTorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language 深度学习 multi-modal contrastive-loss transformers coreml-models

Jupyter Notebook

5546

521

1 个月前

lyuchenyang / Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

language-model multi-modal-learning 自然语言处理深度学习机器学习 neural-networks

Python

1581

130

9 个月前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python

1305

2 年前

lucidrains / x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

人工智能深度学习 contrastive-learning zero-shot-learning multi-modal-learning

Python

715

2 年前

jokieleung / awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Awesome Lists vqa multi-modal multi-modal-learning

664

2 年前

InternRobotics / EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

3d-vision 机器视觉 multi-modal-learning Robotics

Python

634

4 个月前

kyegomez / zeta

Build high-performance AI models with modular building blocks

人工智能 multi-modal transformers 深度学习 gpt4 llama2 multi-agent-systems multi-modal-learning multi-platform PyTorch speech-recognition transformer

Python

554

5 天前

DmitryRyumin / CVPR-2023-24-Papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

action-recognition autonomous-driving biometrics 机器视觉 cvpr cvpr2023 datasets 深度学习 face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition segmentation self-supervised-learning video-synthesis cvpr2024

Python

454

1 年前

zjukg / KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

cross-modal-retrieval Entity resolution image-classification image-generation information-extraction knowledge-graph knowledge-graph-embeddings large-language-models multi-modal-learning paper-list survey surveys visual-question-answering awsome

447

10 个月前

zhengli97 / PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

cvpr2024 multi-modal-learning prompt-learning vision-language-model knowledge-distillation clip

Python

330

1 个月前

Ysz2022 / NeRCo

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

neural-representation multi-modal-learning iccv iccv2023

Python

252

2 年前

moabarar / nemar

[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

multimodal image-to-image-translation multi-modal multi-modal-learning affine-transformation 深度学习 cnn PyTorch image-registration cvpr2020

Python

189

5 年前

GuanRunwei / Achelous

The official repository of Achelous and Achelous++

multi-modal-learning multi-task-learning object-detection object-tracking point-cloud-segmentation semantic-segmentation

Python

159

1 年前

huggingface / chug

Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

机器视觉 datasets distributed-training document-understanding multi-modal-learning pdf-document

Python

159

2 年前

qizekun / ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Point cloud multi-modal-learning representation-learning self-supervised-learning

Python

147

1 年前

wjun0830 / CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

机器视觉 detr multi-modal-learning PyTorch video-understanding

Python

140

1 年前

shikras / d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

multi-modal-learning object-detection referring-expression-comprehension vision-language dataset open-vocabulary-detection

Python

138

2 年前