Repository navigation
multimodal-learning
- Website
- Wikipedia
Reading list for research topics in multimodal machine learning
An open-source framework for training large multimodal models.
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
A curated list of Multimodal Related Research.
[CVPR 2024 & TPAMI 2025] UniRepLKNet
A Comparative Framework for Multimodal Recommender Systems
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
A collection of resources on applications of multi-modal learning in medical imaging.
Papers, code and datasets about deep learning and multi-modal learning for video analysis
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
A curated list of awesome vision and language resources (still under construction... stay tuned!)
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Multi-modality pre-training
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"