Repository navigation

#

multimodal-datasets

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information about recent multimodal datasets which are available for research purposes. We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.

286
3 年前

Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

Jupyter Notebook
207
1 年前

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

Python
158
1 年前

500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。

Jupyter Notebook
128
6 年前

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

72
2 年前

Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"

Python
48
9 天前

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

Python
46
4 个月前

[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

39
3 个月前

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

Python
32
6 年前

Millions-Level Face/Human-Scene Image-Text Datasets

14
3 个月前

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

Python
9
1 年前

Towards Explainable Multimodal Depression Recognition for Clinical Interviews

6
3 个月前

Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)

Python
5
4 年前

Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"

Python
3
2 年前

Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.

Jupyter Notebook
3
2 年前