Repository navigation

#

cross-modal-learning

Official implementation of "CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding" (CVPR, 2022)

Python
249
2 年前

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Python
236
5 个月前

【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

Python
192
1 年前

【CVPR'2023】Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

Python
149
7 个月前

CVPR 2022: Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?

Python
129
4 个月前

[CVPR 2023 Highlight 💡] Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision

Python
125
2 年前

[ICLR 2023] Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

Python
101
10 个月前

[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.

Python
41
6 个月前

In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.

Jupyter Notebook
11
4 年前

[IJBHI 2023] This is the official implementation of CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation accepted to IEEE Journal of Biomedical and Health Informatics (J-BHI), 2023.

Python
8
1 年前

Original PyTorch implementation of the code for the paper "Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data" at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020

Python
8
3 年前

This is a cross-modal benchmark for industrial anomaly detection.

Python
8
4 天前

Code for Limbacher, T., Özdenizci, O., & Legenstein, R. (2022). Memory-enriched computation and learning in spiking neural networks through Hebbian plasticity. arXiv preprint arXiv:2205.11276.

Python
7
2 年前

This project creates the T4SA 2.0 dataset, i.e. a big set of data to train visual models for Sentiment Analysis in the Twitter domain using a cross-modal student-teacher approach.

Jupyter Notebook
4
2 年前

An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.

4
3 年前

CUCA: Predicting fine-grained cell types from histology images through cross-modal learning in spatial transcriptomics

Python
1
24 天前