Repository navigation

captioning

Website
Wikipedia

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

PyTorch vqa pretrained-models multimodal 深度学习 captioning dialog textvqa hateful-memes multi-tasking

Python

5585

940

4 个月前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python

2629

216

8 小时前

fpgaminer / joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

captioning vlm

Jupyter Notebook

788

5 天前

ltguo19 / VSUA-Captioning

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

captioning language-generation 深度学习 PyTorch 自然语言处理

Python

257

6 年前

DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

captioning clip gpt-2 multimodal-deep-learning zero-shot-learning

Python

198

2 年前

Labbeti / aac-datasets

Audio Captioning datasets for PyTorch.

PyTorch audio caption datasets captioning dataset 深度学习

Python

120

1 个月前

HaydenFaulkner / Tennis

A Tennis dataset and models for event detection & commentary generation

机器学习机器视觉 dataset fine-grained captioning Video mxnet gluon

Python

101

2 个月前

mitvis / vistext

VisText is a benchmark dataset for semantically rich chart captioning.

captioning charts dataset t5

Jupyter Notebook

9 天前

drethage / fully-convolutional-point-network

Fully-Convolutional Point Networks for Large-Scale Point Clouds

机器视觉 3D semantic-segmentation 深度学习深度神经网络 point-clouds captioning Point cloud meshes

Python

6 年前

Mauville / MedCLIP

Medical image captioning using OpenAI's CLIP

深度学习 clip captioning 机器学习 Medical imaging

Jupyter Notebook

2 年前

audio-captioning / clotho-dataset

Python code for handling the Clotho dataset.

audio 深度学习自然语言处理 captioning

Python

5 年前

wangleihitcs / MedicalReportGeneration

A Base Tensorflow Project for Medical Report Generation

Tensorflow captioning

Python

6 年前

ParitoshParmar / MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

multitask-learning video-understanding video-processing video-captioning PyTorch action-recognition representation-learning lstm captioning

Python

3 个月前

aimagelab / pacscore

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

captioning captioning-videos 机器视觉 cvpr cvpr2023 vision-and-language

Python

22 天前

TheShadow29 / VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

vision vision-and-language grounding 自然语言处理 Video srl captioning-videos captioning

Python

4 年前

42lux / CaptainCaption

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

captioning gpt-4-vision gradio openai-api tagging

Python

9 个月前

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

audio captioning 监控 text

Python

1 个月前

lucidrains / AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

attention attention-mechanism vqa visual-question-answering captioning

Python

5 年前

DavidMChan / caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

人工智能 captioning ChatGPT 深度学习 Image 机器学习 Python

Python

2 年前

audio-captioning / dcase-2020-baseline

Audio captioning baseline system for DCASE 2020 challenge.

captioning 深度学习深度神经网络机器学习 signal-processing

Python

2 年前