Repository navigation

speech-to-text

Website
Wikipedia

ggml-org / whisper.cpp

Port of OpenAI's Whisper model in C/C++

openai speech-to-text transformer Whisper inference speech-recognition

C++

43624

4775

3 天前

mozilla / DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

深度学习机器学习 neural-networks Tensorflow speech-recognition speech-to-text deepspeech embedded on-device offline

C++

26607

4081

4 个月前

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

深度学习 inference quantization speech-recognition speech-to-text transformer Whisper openai

Python

18394

1519

2 个月前

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text Whisper

Python

18004

1892

2 天前

leon-ai / leon

🧠 Leon is your open-source personal assistant.

leon personal-assistant Node.js Python 人工智能 speech-to-text text-to-speech speech-recognition speech-synthesis flite assistant virtual-assistant 聊天机器人 Bot voice-assistant 自动化 offline 隐私 ai-assistant

TypeScript

16690

1385

21 天前

kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

kaldi C++CUDA Shell speech-recognition speech-to-text speaker-verification speaker-id speech

Shell

15149

5369

13 天前

jianchang512 / pyvideotrans

Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，同时支持语音识别转录、语音合成、字幕翻译。

text-to-speech video-transition speech-to-text

Python

14438

1675

16 天前

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Jupyter Notebook

13328

1581

24 天前

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

speech-recognition speech-toolkit speaker-recognition speech-to-text speech-enhancement speech-separation audio audio-processing speech-processing speechrecognition asr voice-recognition speaker-diarization speaker-verification PyTorch huggingface transformers language-model 深度学习

Python

10511

1559

9 天前

Uberi / speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

Python audio speech-recognition speech-to-text

Python

8871

2429

20 天前

KoljaB / RealtimeSTT

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

Python realtime speech-to-text

Python

8696

730

3 个月前

nl8590687 / ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Tensorflow cnn ctc Python Keras speech-recognition speech-to-text chinese-speech-recognition asrt

Python

8244

1910

1 个月前

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages

asr onnx Windows Linux macOS C++Android iOS 树莓派 aarch64 arm32 C#.NET mfc speech-to-text text-to-speech vits RISC-V lazarus object-pascal

C++

7660

889

5 天前

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

人工智能 asr gpt-4o speech-recognition speech-to-text aigc cross-lingual 大语言模型 Python PyTorch multilingual

Python

6706

617

2 个月前

TalAter / annyang

💬 Speech recognition for your site

speech-recognition speech speech-to-text voice

JavaScript

6663

1043

1 年前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook

5498

343

2 年前

modelscope / FunClip

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

speech-recognition video-clip video-subtitles subtitles-generator speech-to-text gradio gradio-python-llm 大语言模型

Python

5001

594

3 个月前

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text Whisper

Jupyter Notebook

4999

466

2 个月前

abus-aikorea / voice-pro

Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.

faster-whisper tts Whisper gradio subtitles transcription translator webui speech-recognition speech-synthesis speech-to-text text-to-speech yt-dlp voice-cloning podcasts audiobook voice-conversion karaoke whisperx

Python

4840

414

2 个月前

sanchit-gandhi / whisper-jax

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

深度学习 jax speech-recognition speech-to-text Whisper

Jupyter Notebook

4633

406

2 年前