Repository navigation

#

vqa

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python
5558
13 天前

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Python
3215
8 个月前

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python
2547
6 天前

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
2235
1 天前

北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。

1779
2 年前

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Jupyter Notebook
1446
2 年前

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Python
1310
1 年前

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Jupyter Notebook
846
2 年前

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Python
757
1 年前

Visual Question Answering in Pytorch

Python
728
5 年前

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

Python
718
2 年前

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

662
2 年前

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Python
516
1 年前

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

Python
501
4 年前

Visual Q&A reading list

437
7 年前

PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).

Python
424
4 年前

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Jupyter Notebook
348
3 年前

A lightweight, scalable, and general framework for visual question answering research

Python
322
4 年前

Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems

287
2 年前