Repository navigation

speech-synthesis

Website
Wikipedia

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python text-to-speech 深度学习 speech PyTorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis voice-cloning voice-synthesis voice-conversion

Python

42136

5514

1 年前

leon-ai / leon

🧠 Leon is your open-source personal assistant.

leon personal-assistant Node.js Python 人工智能 speech-to-text text-to-speech speech-recognition speech-synthesis flite assistant virtual-assistant 聊天机器人 Bot voice-assistant 自动化 offline 隐私 ai-assistant

TypeScript

16583

1377

16 天前

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python

15431

3053

40 分钟前

NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

机器视觉深度学习 drug-discovery forecasting large-language-models mxnet paddlepaddle PyTorch recommender-systems speech-recognition speech-synthesis Tensorflow tensorflow2 translation 自然语言处理

Jupyter Notebook

14453

3361

1 年前

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr speech-recognition voice-cloning vocoder voice-recognition self-supervised-learning Whisper

Python

12169

1930

6 天前

rhasspy / piper

A fast, local neural text to speech system

speech-synthesis text-to-speech tts

C++

9860

799

1 个月前

espnet / espnet

End-to-End Speech Processing Toolkit

深度学习 end-to-end chainer PyTorch kaldi speech-recognition speech-synthesis speech-translation machine-translation voice-conversion speech-enhancement speech-separation singing-voice-synthesis speaker-diarization text-to-speech

Python

9386

2315

2 天前

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder

Python

9302

740

3 个月前

voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

sovits vits voice-conversion so-vits-svc hubert softvc realtime voice-changer 深度学习 PyTorch speech-synthesis Generative Adversarial Network lightning pytorch-lightning Hacktoberfest

Python

9099

1215

3 天前

rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

tts speech-synthesis text-to-speech

Python

8873

823

15 天前

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

PyTorch speech speech-synthesis tts multi-speaker text-to-speech 深度学习 prompt emotivoice 人工智能 Python emotion style

Python

8139

710

1 年前

jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

tts text-to-speech PyTorch 深度学习 speech-synthesis

Python

7624

1374

2 年前

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

深度学习 PyTorch speaker-adaptation speech-synthesis text-to-speech tts wavlm diffusion-models latent-diffusion latent-diffusion-models Generative Adversarial Network

Python

5917

604

1 年前

espeak-ng / espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

espeak-ng espeak Android text-to-speech speech-synthesis

5449

1081

1 天前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook

5443

342

2 年前

MoonInTheRiver / DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

text-to-speech diffusion-speedup tts aaai2022 singing-synthesis diffusion-model speech-synthesis singing-voice-synthesis singing-voice singing-voice-database MIDI

Python

4585

767

5 个月前

abus-aikorea / voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.