Repository navigation
mscoco
- Website
- Wikipedia
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
CVNets: A library for training computer vision networks
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
This repository contains the source code of our work on designing efficient CNNs for computer vision
VarifocalNet: An IoU-aware Dense Object Detector
The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"
Official ImageNet Model repository
SWA Object Detection
Video Platform for Action Recognition and Object Detection in Pytorch
[ECCV 2020] Boundary-preserving Mask R-CNN
Semantic Propositional Image Caption Evaluation
High-resolution Networks for the Fully Convolutional One-Stage Object Detection (FCOS) algorithm
generate captions for images using a CNN-RNN model that is trained on the Microsoft Common Objects in COntext (MS COCO) dataset
A tensorflow implement mobilenetv3 centernet, which can be easily deployeed on android(MNN) and ios(CoreML).
A tool for converting computer vision label formats.
Adds SPICE metric to coco-caption evaluation server codes
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models