Repository navigation
speech-processing
- Website
- Wikipedia
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Reading list for research topics in multimodal machine learning
Foundation Architecture for (M)LLMs
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
AI powered speech denoising and enhancement
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Controllable and fast Text-to-Speech for over 7000 languages!
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
General Speech Restoration
SincNet is a neural architecture for efficiently processing raw audio samples.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Open source audio annotation tool for humans
Speech, Language, Audio, Music Processing with Large Language Model
You can find the speech algorithms you want here
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection