Repository navigation

vqa

Website
Wikipedia

facebookresearch / mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

PyTorch vqa pretrained-models multimodal 深度学习 captioning dialog textvqa hateful-memes multi-tasking

Python

5585

940

4 个月前

OpenGVLab / InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

ChatGPT foundation-model gpt gpt-4 gradio husky image-captioning langchain 大语言模型 multimodal vqa llama vicuna video-generation sam segment-anything click draggan

Python

3216

231

1 年前

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4v large-language-models llava multi-modal openai vqa 大语言模型 openai-api qwen gpt 机器视觉 PyTorch gpt4 ChatGPT clip vit evaluation claude gemini

Python

2921

476

6 天前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python

2629

216

13 小时前

BDBC-KG-NLP / QA-Survey-CN

北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。

survey 自然语言处理 question-answering kbqa vqa qa

1792

261

2 年前

peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqa visual-question-answering faster-rcnn caffe image-captioning mscoco

Jupyter Notebook

1453

377

3 年前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python

1309

2 年前

microsoft / Oscar

Oscar and VinVL

vision-and-language pre-training image-captioning vqa oscar

Python

1050

250

2 年前

hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

transformers transformer vqa detr visualization explainability explainable-ai interpretability clip

Jupyter Notebook

863

113

2 年前

hengyuan-hu / bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

vqa PyTorch

Python

759

181

1 年前

Cadene / vqa.pytorch

Visual Question Answering in Pytorch

vqa 深度学习 resnet PyTorch coco torch

Python

731

179

6 年前

jayleicn / ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

PyTorch video-question-answering vqa vision-and-language cvpr2021

Python

721

2 年前

jokieleung / awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Awesome Lists vqa multi-modal multi-modal-learning

664

2 年前

OpenGVLab / Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat 聊天机器人 ChatGPT gradio large-language-models 大语言模型 vqa multi-modality vision-language-model

Python

533

1 年前