Repository navigation

#

cross-modal

A curated list of different papers and datasets in various areas of audio-visual processing

715
1 年前

PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

Python
560
2 年前

Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.

Jupyter Notebook
486
1 年前

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

Python
217
4 个月前

Remote Sensing Sar-Optical Land-use Classfication Pytorch Pytorch高分辨率遥感语义分割/地物分割/地物分类

Python
214
1 年前

[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

Swift
124
2 年前

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)

Python
99
1 年前

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

Python
95
2 年前

Unofficial Implementation of Google Deepmind's paper `Objects that Sound`

Python
83
7 年前

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

Python
72
2 年前

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

72
2 年前

[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"

Python
68
6 个月前

Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]

Python
61
1 年前

Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Python
60
2 年前