Repository navigation

#

audio-visual-learning

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Python
1707
1 年前
Python
408
4 个月前

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

Python
396
5 个月前

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

Python
307
3 个月前

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

Python
181
4 年前

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

Python
137
2 年前

Co-Separating Sounds of Visual Objects (ICCV 2019)

Python
94
2 年前

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Python
86
4 年前

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Python
63
1 年前

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Python
51
1 年前

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Python
43
1 年前

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

Python
41
3 年前

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

Python
38
5 个月前

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Python
35
2 年前

Official implementation for AVGN

Python
34
2 年前

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

Python
33
3 年前

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Python
32
1 年前

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

Python
31
9 个月前

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Python
29
2 年前