Repository navigation

#

audio-visual-learning

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Python
1762
2 年前
Python
435
6 天前

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

Python
402
1 年前

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

Python
310
8 个月前

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

Python
192
5 年前

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

Python
143
3 年前

Co-Separating Sounds of Visual Objects (ICCV 2019)

Python
97
2 年前

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Python
89
4 年前

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Python
68
2 年前

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Python
57
1 年前

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Python
47
2 年前

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

Python
43
10 个月前

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

Python
42
3 年前

Official implementation for AVGN

Python
37
3 年前

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

Python
36
3 年前

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Python
36
2 年前

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

Python
33
1 年前

FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition

Python
32
10 个月前

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Python
31
3 年前

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Python
31
2 年前