Repository navigation
speech
- Website
- Wikipedia
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
SoftVC VITS Singing Voice Conversion
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
kaldi-asr/kaldi is the official location of the Kaldi project.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
ModelScope: bring the notion of Model-as-a-Service to life.
Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
💬 Speech recognition for your site
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Foundational model for human-like, expressive TTS
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Code examples for new APIs of iOS 10.