Repository navigation

#

captioning-videos

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python
3285
7 个月前

[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

Python
69
5 年前

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

Python
63
21 天前

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Python
61
4 年前

PyTorch Implementation of Consensus-based Sequence Training for Video Captioning

Python
60
7 年前

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Python
45
5 年前

Transcription and annotation interface for recorded audio or video files

JavaScript
39
19 天前

A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well as S.O.T.A. diffusion and auto-tag/caption models for your purposes. Custom datasets can be added!

Python
38
16 天前

Video to Language Challenge (MSR-VTT Challenge 2016)

Jupyter Notebook
32
8 年前

An image and video description generator using an CNN-RNN based architecture.

Jupyter Notebook
23
1 年前

M-VAD Names Dataset. Multimedia Tools and Applications (2019)

Python
20
6 年前

Sample app to add captions to an uploaded video. From api.video (https://api.video)

JavaScript
11
3 年前

Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)

Python
7
4 个月前

Video Search using Natural Language

Python
3
7 年前

Generate TikTok— and Instagram—tailored captions and hashtags for your videos using the power of some super creative robots up in the clouds ☁️ 🤖 💬 ☁️

Python
3
1 年前

A multilingual automatic speech recognition and video captioning tool using faster whisper. Supports real-time translation to english. Runs on consumer grade cpu.

HTML
2
6 天前

An AI-powered, fully automated n8n workflow that converts a single text prompt into scroll-stopping YouTube Shorts using dynamic visuals, dramatic TTS narration, and real-time editing — all self-hosted using Docker, Ollama, MinIO, and open-source tools.

1
1 个月前