Repository navigation

video-language-pretraining

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python

3054

280

1 年前

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Python

149

7 个月前

Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]

Python

116

1 年前

[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

Python

6 个月前

A Survey on video and language understanding.

2 年前

ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model

Python

2 年前

The official GitHub page for the survey paper "Self-Supervised learning for Videos: A survey"

2 年前