Repository navigation

#

video-question-answering

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python
3214
3 个月前

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

Python
718
2 年前

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python
614
4 个月前

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

Python
296
1 年前

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

Python
226
2 年前

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python
215
7 个月前

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Python
187
3 年前

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering

Python
186
1 年前
Python
157
4 个月前

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python
149
9 个月前

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Python
133
1 年前

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Python
129
3 个月前

[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

Python
129
2 年前

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Jupyter Notebook
120
2 年前

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Python
117
4 个月前

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Python
74
25 天前

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

Python
68
10 个月前

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.

Python
67
19 天前