Repository navigation
#
textvqa
- Website
- Wikipedia
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Python
5585
4 个月前
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
Python
64
4 年前
PyTorch DataLoader for many VQA datasets
Python
13
3 年前
[PRL 2024] This is the code repo for our label-free pruning and retraining technique for autoregressive Text-VQA Transformers (TAP, TAP†).
Python
2
1 年前