Repository navigation

#

audio-visual-learning

Website
Wikipedia

ali-vilab / dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

audio-visual-learning face-animation talking-head video-generation

Python

1762

215

2 年前

tanshuai0219 / EDTalk

[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation

audio-visual-learning face-animation talking-face-generation talking-head video-generation

Python

435

36

6 天前

OpenNLPLab / AVSBench

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

audio-visual-learning

Python

402

36

1 年前

xid32 / NAACL_2025_TWM

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

multimodal-large-language-models audio-visual-learning question-answering video-captioning

Python

310

30

8 个月前

YapengTian / AVE-ECCV18

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

audio-visual-learning

Python

192

32

5 年前

alvinliu0 / HA2G

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

audio-visual-learning cvpr2022

Python

143

9

3 年前

rhgao / co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)

audio-visual-learning sound-separation cross-modality

Python

97

22

2 年前

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

distillation audio-visual-learning cvpr2021 contrastive-learning PyTorch video-recognition

Python

89

11

4 年前

ttgeng233 / UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

audio-visual-learning multi-modal-learning

Python

68

6

2 年前

roger-tseng / av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

audio-visual-learning representation-learning

Python

57

4

1 年前

praveena2j / JointCrossAttentional-AV-Fusion

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

affective-computing attention-model audio-visual-learning emotion emotion-recognition multimodal-learning

Python

47

9

2 年前

praveena2j / Joint-Cross-Attention-for-Audio-Visual-Fusion

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

affective-computing attention attention-model audio-visual-learning emotion-recognition multimodal-learning

Python

43

11

10 个月前

jasongief / PSP_CVPR_2021

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

audio-visual-learning

Python

42

12

3 年前

Official implementation for AVGN

audio-visual-learning

Python

37

3

3 年前

stoneMo / EZ-VSL

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

self-supervised-learning audio-visual-learning

Python

36

10

3 年前

MengyuanChen21 / CVPR2023-CMPAE

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

audio-visual-learning cvpr2023 video-understanding

Python

36

4

2 年前

stoneMo / DeepAVFusion

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

attention-mechanism audio-visual-learning multimodal-learning self-supervised-learning transformer-architecture masked-image-modeling

Python

33

1

1 年前

praveena2j / Cross-Attentional-AV-Fusion

FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition

affective-computing attention-model audio-visual-learning emotion emotion-recognition multimodal-learning

Python

32

5

10 个月前

jasongief / CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

audio-visual-learning

Python

31

5

3 年前

kyuyeonpooh / objects-that-sound

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

cross-modal-retrieval 深度学习 audio-visual-learning

Python

31

4

2 年前