Speech & Audio

Speech recognition, speech synthesis, and audio processing frameworks.

Repositories

Whisper is a general-purpose speech recognition model by OpenAI. Trained on 680,000 hours of diverse audio, it performs multilingual speech recognition, translation, and language identification with high robustness.

Python
99.1k
23 days ago
CorentinJ/Real-Time-Voice-Cloning

Real-time voice cloning system that creates digital voice representations from 5 seconds of audio. Features GUI interface, three-stage deep learning framework (SV2TTS), and supports both CPU/GPU processing for generating arbitrary speech from text input.

Python
59.7k
2 months ago

A powerful few-shot voice cloning and TTS system that requires only 1 minute of audio to train high-quality models, featuring zero-shot conversion, multilingual support, and comprehensive WebUI tools.

Python
57.3k
8 days ago
ggml-org/whisper.cpp

High-performance C/C++ implementation of OpenAI's Whisper speech recognition model. Features hardware acceleration (Metal, CUDA, OpenVINO), real-time transcription, multi-platform support, and lightweight deployment with zero dependencies.

C++
49.5k
a day ago

An advanced deep learning toolkit for Text-to-Speech synthesis with 1100+ language support, voice cloning, and multilingual capabilities. Features pre-trained models, training tools, and real-time streaming.

Python
45.2k
2 years ago

DeepSpeech is Mozilla's open-source speech-to-text engine using TensorFlow. It converts speech to text in real-time on devices from Raspberry Pi to GPU servers, based on Baidu's Deep Speech research with end-to-end deep learning.

C++
26.8k
a year ago