Repository navigation

multimodal-datasets

Website
Wikipedia

LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习 deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets multimodal-deep-learning

Jupyter Notebook

10933

1067

1 年前

remyxai / VQASynth

Compose multimodal datasets 🎹

dataset-generation multimodal-datasets multimodal-deep-learning synthetic-dataset-generation

Python

484

2 个月前

drmuskangarg / Multimodal-datasets

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information about recent multimodal datasets which are available for research purposes. We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.

multimodal-datasets

309

4 年前

AnkurDeria / MFT

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

深度学习 multimodal-datasets multimodal-deep-learning remote-sensing transformer-models

Jupyter Notebook

220

2 年前

wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

clip-model histopathology multimodal-datasets vlm

Python

168

2 年前

yuanxiaosc / Multimodal-short-video-dataset-and-baseline-classification-model

500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型（TensorFlow2.0）。

multimodal-datasets classification-model Tensorflow

Jupyter Notebook

132

6 年前

roboflow / rf100-vl

Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"

机器视觉 multimodal-datasets object-detection

Python

2 天前

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

cross-modal multimodal-datasets multimodal-deep-learning multimodal-pre-trained-model transformer-models vision-language-pretraining

4 个月前

piresramon / gpt-4-enem

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

人工智能 llm-inference 大语言模型 multimodal-datasets

Python

10 个月前

Yuco-Z / Awesome-Multi-Modal-Dialog

[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

Awesome Lists dialogue multimodal multimodal-deep-learning multimodal-datasets multimodal-learning

8 个月前

JunweiLiang / FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

visual-question-answering vision-and-language multimodal-deep-learning multimodal-datasets

Python

6 年前

ddw2AIGROUP2CQUPT / Large-Scale-Multimodal-Face-Datasets

Millions-Level Face/Human-Scene Image-Text Datasets

multimodal-datasets

4 个月前

OlehOnyshchak / pyWikiMM

Collects a multimodal dataset of Wikipedia articles and their images

wikipedia multimodal multimodality multimodal-datasets multimodal-learning 数据库 data-cleaning data-collection data-processing

Python

3 年前

pspdada / SENTINEL

[ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention".

multimodal-datasets multimodal-large-language-models preference-alignment image-captioning

Python

2 个月前

deepmancer / vlm-toolbox

Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation

clip 深度学习 deep-learning-library multimodal-datasets multimodal-deep-learning multimodal-learning prompt-tuning vision-and-language vision-framework vision-language-transformer zero-shot-classification PyTorch transformers

Jupyter Notebook

8 个月前

lujiaying / MUG-Bench

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

multimodal-datasets multimodal-learning

Python

2 年前

NUSTM / EMDRC

Towards Explainable Multimodal Depression Recognition for Clinical Interviews

mental-health dataset datasets affective-computing multimodal-datasets

8 个月前

clp-research / language-models-multimodal-tasks

Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"

language-model multimodal-datasets multimodal-learning

Python

2 年前