Repository navigation

#

audio-visual-learning

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Python
1757
2 年前
Python
433
1 个月前

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

Python
399
9 个月前

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

Python
309
7 个月前

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

Python
188
4 年前

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

Python
143
2 年前

Co-Separating Sounds of Visual Objects (ICCV 2019)

Python
96
2 年前

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Python
89
4 年前

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Python
66
2 年前

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Python
55
1 年前

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Python
47
2 年前

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

Python
43
3 年前

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

Python
43
9 个月前

Official implementation for AVGN

Python
36
2 年前

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

Python
35
3 年前

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Python
35
2 年前

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Python
31
2 年前

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Python
31
2 年前

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

Python
31
1 年前