Repository navigation

speech

Website
Wikipedia

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python text-to-speech 深度学习 speech PyTorch tts vocoder tacotron glow-tts melgan speaker-encoder hifigan speaker-encodings multi-speaker-tts tts-model speech-synthesis voice-cloning voice-synthesis voice-conversion

Python

42136

5514

1 年前

babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time

人工智能 speech PyTorch 深度学习 text-to-speech tts

Python

36558

5264

9 个月前

svc-develop-team / so-vits-svc

SoftVC VITS Singing Voice Conversion

人工智能 audio-analysis Generative Adversarial Network singing-voice-conversion so-vits-svc sovits variational-inference vc vits voice voice-conversion voiceconversion voice-changer flow 深度学习 PyTorch speech

Python

27531

5034

2 年前

huggingface / datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

自然语言处理 datasets PyTorch Tensorflow pandas NumPy 机器视觉机器学习深度学习 speech 人工智能大语言模型

Python

20525

2901

18 小时前

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text Whisper

Python

17365

1832

2 个月前

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

open-vocabulary-detection open-vocabulary-segmentation data-generation automatic-labeling-system caption speech image-editing

Jupyter Notebook

16801

1527

1 年前

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

kaldi C++CUDA Shell speech-recognition speech-to-text speaker-verification speaker-id speech

Shell

15057

5365

1 个月前

AIGC-Audio / AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

audio gpt music sound speech talking-head

Python

10189

859

1 年前

mozilla / TTS

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

深度学习 text-to-speech Python PyTorch tacotron tts speaker-encoder dataset-analysis tacotron2 tensorflow2 vocoder melgan glow-tts speech

Jupyter Notebook

9960

1310

2 年前

modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

自然语言处理 cv speech multi-modal science 深度学习机器学习 Python

Python

8264

860

2 天前

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

PyTorch speech speech-synthesis tts multi-speaker text-to-speech 深度学习 prompt emotivoice 人工智能 Python emotion style

Python

8141

710

1 年前

PaddlePaddle / models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.

paddlepaddle 深度学习神经网络机器视觉自然语言处理 recommendation speech cv models

Python

6932

2882

7 个月前

TalAter / annyang

💬 Speech recognition for your site

speech-recognition speech speech-to-text voice

JavaScript

6664

1046

1 年前

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

voice-detection voice-recognition voice-commands PyTorch onnx voice-activity-detection voice-control onnx-runtime onnxruntime speech speech-processing vad

Python

6581

609

2 个月前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook

5442

342

2 年前

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text Whisper

Jupyter Notebook

4853

454

2 天前

metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS

text-to-speech 人工智能深度学习 PyTorch speech speech-synthesis tts voice-clone zero-shot-tts

Python

4149

692

1 年前

fixie-ai / ultravox

A fast multimodal LLM for real-time voice

人工智能大语言模型 slm speech

Python

4145

334

2 天前

huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

人工智能 assistant language-model 机器学习 Python speech speech-synthesis speech-to-text speech-translation

Python

4145

468

4 个月前

jianchang512 / stt

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具，输出json、srt字幕、纯文字格式

speech speech-recognition speech-to-text stt

Python

3729

398

15 天前