Repository navigation

#

captioning

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python
5585
4 个月前

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python
2629
8 小时前

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

Jupyter Notebook
788
5 天前

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

Python
257
6 年前

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python
198
2 年前

Audio Captioning datasets for PyTorch.

Python
120
1 个月前

A Tennis dataset and models for event detection & commentary generation

Python
101
2 个月前

VisText is a benchmark dataset for semantically rich chart captioning.

Jupyter Notebook
95
9 天前

Medical image captioning using OpenAI's CLIP

Jupyter Notebook
83
2 年前

Python code for handling the Clotho dataset.

Python
82
5 年前

A Base Tensorflow Project for Medical Report Generation

Python
70
6 年前

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

Python
68
3 个月前

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

Python
63
22 天前

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Python
61
4 年前

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

Python
60
9 个月前

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python
55
1 个月前

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

Python
43
5 年前

Using LLMs and pre-trained caption models for super-human performance on image captioning.

Python
42
2 年前

Audio captioning baseline system for DCASE 2020 challenge.

Python
38
2 年前