Речь и аудио

Фреймворки для распознавания речи, синтеза речи и обработки аудио.

Repositories

Whisper — модель распознавания речи общего назначения от OpenAI. Обучена на 680 000 часах разнообразного аудио, поддерживает многоязычное распознавание, перевод и определение языка.

Python
95.3k
ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

C++
47.2k

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python
55.4k
CorentinJ/Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Python
59.5k

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python
44.7k

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

C++
26.7k