Repository navigation

asr

Website
Wikipedia

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

asr speech speech-recognition speech-to-text Whisper

Python

15039

1636

7 天前

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation speaker-recognition asr tts generative-ai multimodal 深度学习 neural-networks speaker-diariazation speech-translation speech-synthesis large-language-models

Python

13668

2794

2 小时前

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

transformer conformer speech-translation streaming-asr speech-alignment punctuation-restoration streaming-tts speech-synthesis tts asr speech-recognition voice-cloning vocoder voice-recognition self-supervised-learning Whisper

Python

11796

1904

3 天前

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

speech-recognition speech-toolkit speaker-recognition speech-to-text speech-enhancement speech-separation audio audio-processing speech-processing speechrecognition asr voice-recognition speaker-diarization speaker-verification PyTorch huggingface transformers language-model 深度学习

Python

9706

1473

3 天前

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

speech-recognition asr voice-recognition speech-to-text Android iOS 树莓派深度学习深度神经网络 speech-to-text-android speaker-verification Python offline 隐私 kaldi deepspeech vosk stt

Jupyter Notebook

9273

1246

1 个月前

wzpan / wukong-robot

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

人工智能 speaker asr tts unit Home Assistant raspeberry-pi amazon-echo alexa snowboy google-home anyq muse bci ChatGPT gpt3 openai

Python

6787

1386

6 个月前

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

asr onnx Windows Linux macOS C++Android iOS 树莓派 aarch64 arm32 C#.NET mfc speech-to-text text-to-speech vits RISC-V lazarus object-pascal

C++

5662

634

2 天前

TEN-framework / TEN-Agent

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

agent gemini gpt-4 大语言模型 multimodal nextjs14 openai realtime voice-assistant C++Go Python 人工智能 gpt-4o rag vision real-time asr low-latency tts

Python

5633

638

1 天前

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

人工智能 asr gpt-4o speech-recognition speech-to-text aigc 大语言模型 Python PyTorch multilingual

Python

5376

482

1 个月前

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

speech-recognition speech-to-text stt asr pretrained-models english german spanish stt-benchmark PyTorch colab onnx text-to-speech speech speech-synthesis tts

Jupyter Notebook

5236

336

2 年前

xiangyuecn / Recorder

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

recorder record JavaScript HTML h5 luyin mp3 wav amr ogg webm WebRTC audio recording asr

JavaScript

5180

1062

19 天前

NexaAI / nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

asr edge-computing 大语言模型 on-device-ai on-device-ml SDK stable-diffusion transformers tts vlm language-model sdk-python Whisper audio

Python

4503

627

1 个月前

wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

e2e-models PyTorch asr transformer conformer production-ready automatic-speech-recognition speech-recognition Whisper

Python

4461

1130

21 天前

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

asr speaker-diarization speech speech-recognition speech-to-text Whisper

Jupyter Notebook

4399

403

1 个月前

jdepoix / youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

youtube-api subtitles YouTube transcripts Python subtitle 命令行界面 captions asr

Python

3762

434

25 天前

PeterH0323 / Streamer-Sales

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋

chat-application internlm2 大语言模型聊天机器人 text-generation chat ChatGPT gpt rag tts asr digital-human

Python

3170

492

1 个月前

tensorflow / lingvo

Lingvo

speech-recognition translation speech-to-text machine-translation mnist seq2seq language-model tts asr lm 自然语言处理 Tensorflow speech research distributed gpu-computing speech-synthesis

Python

2836

450

2 天前

ahmetoner / whisper-asr-webservice

OpenAI Whisper ASR Webservice API

automatic-speech-recognition speech-recognition speech-to-text openai-whisper Docker asr speech

Python

2536

452

2 个月前

coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

stt speech-to-text Tensorflow 深度学习 automatic-speech-recognition asr voice-recognition speech-recognition

C++

2415

285

1 年前

mravanelli / pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

speech-recognition gru dnn kaldi rnn-model PyTorch timit 深度学习深度神经网络 recurrent-neural-networks multilayer-perceptron-network lstm speech asr rnn

Python

2385

445

3 年前