Repository navigation

#

visual-question-answering

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook
5192
8 个月前

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Python
2495
1 年前

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Jupyter Notebook
1446
2 年前

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Python
1237
3 年前

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Python
970
2 年前

Bilinear attention networks for visual question answering

Python
545
1 年前

Deep Modular Co-Attention Networks for Visual Question Answering

Python
452
4 年前

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Jupyter Notebook
348
3 年前

A lightweight, scalable, and general framework for visual question answering research

Python
322
4 年前

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Jupyter Notebook
296
5 个月前

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Python
271
2 年前

Strong baseline for visual question answering

Python
239
2 年前

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

Python
184
6 个月前

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Python
175
7 个月前

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Python
163
6 年前

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Python
159
1 年前