Repository navigation

tts

Website
Wikipedia

CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

深度学习 PyTorch Tensorflow tts voice-cloning Python

Python

54011

8943

8 个月前

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

text-to-speech tts vits voice-clone voice-cloneai voice-cloning

Python

44326

4942

4 天前

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python text-to-speech 深度学习 speech PyTorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis voice-cloning voice-synthesis voice-conversion

Python

39433

4993

8 个月前

babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

人工智能 speech PyTorch 深度学习 text-to-speech tts

Python

36146

5250

5 个月前

2noise / ChatTTS

A generative speech model for daily dialogue.

agent text-to-speech chat ChatGPT chattts chinese chinese-language english english-language gpt 大语言模型 llm-agent natural-language-inference Python torch tts

Python

35816

3890

1 个月前

mudler / LocalAI

🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference

llama rwkv 人工智能大语言模型 stable-diffusion API Kubernetes gpt4all tts musicgen mamba audio-generation image-generation text-generation gemma mistral llama3 rerank distributed libp2p

31887

2429

24 分钟前

myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

text-to-speech tts voice-clone zero-shot-tts

Python

31860

3253

3 个月前

fishaudio / fish-speech

SOTA Open Source TTS

llama transformer tts valle vits vqgan vqvae

Python

20734

1639

7 天前

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python

13668

2794

1 小时前

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

audio-generation gpt-4o text-to-speech tts cantonese 聊天机器人 ChatGPT chinese english fine-grained fine-tuning japanese korean natural-language-generation Python voice-cloning

Python

13160

1338

7 小时前

mastra-ai / mastra

The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.

agents 人工智能 chatbots JavaScript 大语言模型 Next Node.js React TypeScript workflows evals mcp tts

TypeScript

12109

633

3 小时前

pot-app / pot-desktop

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

translation pot Tauri translate pot-app OCR Linux macOS Windows recognize tts

JavaScript

12010

546

3 个月前

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr speech-recognition voice-cloning vocoder voice-recognition self-supervised-learning Whisper

Python

11796

1904

3 天前