Repository navigation

text-to-speech

Website
Wikipedia

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

text-to-speech tts vits voice-clone voice-cloneai voice-cloning

Python

51315

5644

25 天前

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.

fine-tuning llama 大语言模型 mistral gemma llama3 unsloth deepseek deepseek-r1 gemma3 text-to-speech tts qwen qwen3 agent openai gpt-oss voice-cloning reinforcement-learning

Python

46566

3807

2 天前

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python text-to-speech 深度学习 speech PyTorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis voice-cloning voice-synthesis voice-conversion

Python

42854

5661

1 年前

2noise / ChatTTS

A generative speech model for daily dialogue.

agent text-to-speech chat ChatGPT chattts chinese chinese-language english english-language gpt 大语言模型 llm-agent natural-language-inference Python torch tts

Python

37903

4102

3 个月前

babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

人工智能 speech PyTorch 深度学习 text-to-speech tts

Python

36673

5263

1 年前

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

text-to-speech tts voice-clone zero-shot-tts

Python

34558

3795

6 个月前

nari-labs / dia

A TTS model capable of generating ultra-realistic dialogue in one pass.

人工智能 text-to-speech

Python

18499

1584

3 个月前

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

audio-generation gpt-4o text-to-speech tts cantonese 聊天机器人 ChatGPT chinese english fine-grained fine-tuning japanese korean multi-lingual natural-language-generation Python cosyvoice cross-lingual voice-cloning

Python

16710

1815

7 天前

leon-ai / leon

🧠 Leon is your open-source personal assistant.

leon personal-assistant Node.js Python 人工智能 speech-to-text text-to-speech speech-recognition speech-synthesis flite assistant virtual-assistant 聊天机器人 Bot voice-assistant 自动化 offline 隐私 ai-assistant

TypeScript

16690

1385

21 天前

jianchang512 / pyvideotrans

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，同时支持语音识别转录、语音合成、字幕翻译。

text-to-speech video-transition speech-to-text

Python

14438

1675

16 天前

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

cross-lingual text-to-speech tts voice-clone zero-shot-tts

Python

12704

1356

3 天前

rhasspy / piper

A fast, local neural text to speech system

speech-synthesis text-to-speech tts

C++

10104

827

1 个月前

mozilla / TTS

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

深度学习 text-to-speech Python PyTorch tacotron tts speaker-encoder dataset-analysis tacotron2 tensorflow2 vocoder melgan glow-tts speech

Jupyter Notebook

10012

1316

2 年前

espnet / espnet

End-to-End Speech Processing Toolkit

深度学习 end-to-end chainer PyTorch kaldi speech-recognition speech-synthesis speech-translation machine-translation voice-conversion speech-enhancement speech-separation singing-voice-synthesis speaker-diarization text-to-speech

Python

9494

2326

3 天前

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder

Python

9420

760

4 个月前

rany2 / edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

tts speech-synthesis text-to-speech

Python

9151

863

1 个月前

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

PyTorch speech speech-synthesis tts multi-speaker text-to-speech 深度学习 prompt emotivoice 人工智能 Python emotion style

Python

8335

732

1 年前

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

emotional-speech gpt text-to-speech voice-clone transformer-architecture tts vall-e

Python

7924

788

2 年前

jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

tts text-to-speech PyTorch 深度学习 speech-synthesis

Python

7701

1376

2 年前

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages

asr onnx Windows Linux macOS C++Android iOS 树莓派 aarch64 arm32 C#.NET mfc speech-to-text text-to-speech vits RISC-V lazarus object-pascal

C++

7660

889

5 天前