Speech & Audio

Speech recognition, speech synthesis, and audio processing frameworks.

Repositories

Whisper is a general-purpose speech recognition model by OpenAI. Trained on 680,000 hours of diverse audio, it performs multilingual speech recognition, translation, and language identification with high robustness.

Python
96.3k
3 months ago
CorentinJ/Real-Time-Voice-Cloning

Real-time voice cloning system that creates digital voice representations from 5 seconds of audio. Features GUI interface, three-stage deep learning framework (SV2TTS), and supports both CPU/GPU processing for generating arbitrary speech from text input.

Python
59.5k
12 days ago

A powerful few-shot voice cloning and TTS system that requires only 1 minute of audio to train high-quality models, featuring zero-shot conversion, multilingual support, and comprehensive WebUI tools.

Python
56.0k
a month ago
ggml-org/whisper.cpp

High-performance C/C++ implementation of OpenAI's Whisper speech recognition model. Features hardware acceleration (Metal, CUDA, OpenVINO), real-time transcription, multi-platform support, and lightweight deployment with zero dependencies.

C++
47.8k
3 hours ago

An advanced deep learning toolkit for Text-to-Speech synthesis with 1100+ language support, voice cloning, and multilingual capabilities. Features pre-trained models, training tools, and real-time streaming.

Python
44.9k
2 years ago

DeepSpeech is Mozilla's open-source speech-to-text engine using TensorFlow. It converts speech to text in real-time on devices from Raspberry Pi to GPU servers, based on Baidu's Deep Speech research with end-to-end deep learning.

C++
26.7k
9 months ago