Repository navigation

#

speech-synthesis

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python
15431
40 分钟前

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook
14453
1 年前

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Python
12169
6 天前

A fast, local neural text to speech system

C++
9860
1 个月前
open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Python
9302
3 个月前

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python
8873
15 天前

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python
7624
2 年前

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python
5917
1 年前

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

C
5449
1 天前

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Jupyter Notebook
5443
2 年前

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

Python
4585
5 个月前
abus-aikorea/voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

Python
4393
1 个月前

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook
4334
2 个月前
Python
4145
4 个月前