Repository navigation

#

audio-visual-speech-recognition

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python
9860
5 天前

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Python
227
1 年前

[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".

Python
17
1 个月前

Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Python
16
9 个月前

Official source code for the paper "Tailored Design of Audio-Visual Speech Recognition Models using Branchformers"

Python
11
2 个月前

(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

Python
9
6 个月前

(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

Python
8
1 个月前

🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.

Python
5
5 年前

Real-Time Audio-visual Speech Recongition

Python
4
8 个月前

In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-

Python
2
3 年前

Code related to the fMRI experiment on the contextual modulation of the McGurk Effect

MATLAB
1
3 年前