Repository navigation

#

asr

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python
18004
2 天前

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python
15804
4 小时前

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Python
12266
7 天前

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 12 programming languages

C++
7660
5 天前
wzpan/wukong-robot

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。

Python
6993
1 年前

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

Python
6240
25 天前

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Jupyter Notebook
5498
2 年前

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式,支持pc和Android、iOS部分浏览器、Hybrid App(提供Android iOS App源码)、微信,提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

JavaScript
5439
6 个月前

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Jupyter Notebook
4999
2 个月前

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python
4821
10 天前

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋

Python
3495
7 个月前
Python
2689
10 个月前

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

2535
5 个月前

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

C++
2517
2 年前