Speech & Audio

Speech recognition, speech synthesis, and audio processing frameworks.

Repositories

openai / whisper

Whisper is a general-purpose speech recognition model by OpenAI. Trained on 680,000 hours of diverse audio, it performs multilingual speech recognition, translation, and language identification with high robustness.

Python

103.0k

2 months ago

CorentinJ / Real-Time-Voice-Cloning

Real-time voice cloning system that creates digital voice representations from 5 seconds of audio. Features GUI interface, three-stage deep learning framework (SV2TTS), and supports both CPU/GPU processing for generating arbitrary speech from text input.

Python

59.9k

3 months ago

RVC-Boss / GPT-SoVITS

A powerful few-shot voice cloning and TTS system that requires only 1 minute of audio to train high-quality models, featuring zero-shot conversion, multilingual support, and comprehensive WebUI tools.

Python

58.9k

2 days ago

ggml-org / whisper.cpp

High-performance C/C++ implementation of OpenAI's Whisper speech recognition model. Features hardware acceleration (Metal, CUDA, OpenVINO), real-time transcription, multi-platform support, and lightweight deployment with zero dependencies.

C++

50.9k

3 days ago

coqui-ai / TTS

An advanced deep learning toolkit for Text-to-Speech synthesis with 1100+ language support, voice cloning, and multilingual capabilities. Features pre-trained models, training tools, and real-time streaming.

Python

45.6k

2 years ago

mozilla / DeepSpeech

DeepSpeech is Mozilla's open-source speech-to-text engine using TensorFlow. It converts speech to text in real-time on devices from Raspberry Pi to GPU servers, based on Baidu's Deep Speech research with end-to-end deep learning.

C++

26.8k

a year ago

Collections

Speech & Audio

Repositories

openai / whisper

CorentinJ / Real-Time-Voice-Cloning

RVC-Boss / GPT-SoVITS

ggml-org / whisper.cpp

coqui-ai / TTS

mozilla / DeepSpeech

Graph