Repository navigation

textvqa

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Python

5595

941

5 个月前

Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.

Python

4 年前

PyTorch DataLoader for many VQA datasets

Python

3 年前

[PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†).

Python

1 年前