Repository navigation

#

video-captioning

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Python
970
2 年前

pytorch implementation of video captioning

Python
399
6 年前

Video to Text: Natural language description generator for some given video. [Video Captioning]

Python
343
3 年前

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

Python
307
3 个月前

[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale

Jupyter Notebook
190
1 年前

[ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

Jupyter Notebook
170
4 年前

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

Python
166
6 年前

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Python
133
1 年前

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Python
129
3 个月前

Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*

Jupyter Notebook
122
1 年前

A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)

110
3 年前

[ECCV 2020] PyTorch code of MMT (a multimodal transformer captioning model) on TVCaption dataset

Python
90
2 年前

这是一个基于Pytorch平台、Transformer框架实现的视频描述生成 (Video Captioning) 深度学习模型。 视频描述生成任务指的是:输入一个视频,输出一句描述整个视频内容的文字(前提是视频较短且可以用一句话来描述)。本repo主要目的是帮助视力障碍者欣赏网络视频、感知周围环境,促进“无障碍视频”的发展。

Python
87
3 年前

A PyTorch implementation of state of the art video captioning models from 2015-2019 on MSVD and MSRVTT datasets.

Python
71
2 年前

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

Python
68
5 个月前

[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

Jupyter Notebook
66
1 年前

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

Python
62
3 年前

Video captioning baseline models on Video2Commonsense Dataset.

Python
56
4 年前