Repository navigation

voice-activity-detection

Website
Wikipedia

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

conformer PyTorch speech-recognition paraformer punctuation speaker-diarization rnnt audio-visual-speech-recognition pretrained-model voice-activity-detection Whisper dfsmn vad speechgpt speechllm

Python

12898

1307

4 天前

noisetorch / NoiseTorch

Real-time microphone noise suppression on Linux.

noise-reduction noise-suppression voice voice-activity-detection voice-activated pulseaudio Linux Hacktoberfest hacktoberfest2023

9881

244

9 个月前

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch speech-processing speaker-diarization voice-activity-detection pretrained-models speaker-recognition speaker-verification

Jupyter Notebook

8422

949

15 小时前

smacke / ffsubsync

Automagically synchronize subtitles with video.

subtitles Video audio FFmpeg vad fft synchronization sync subtitle captions vlc vlc-media-player srt srt-subtitles voice-activity-detection fast-fourier-transform alignment caption

Python

7361

300

1 个月前

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

voice-detection voice-recognition voice-commands PyTorch onnx voice-activity-detection voice-control onnx-runtime onnxruntime speech speech-processing vad

Python

6996

639

1 个月前

jim-schwoebel / voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

datasets dataset voice data voice-control voice-synthesis voice-commands voice-assistant voice-recognition voice-chat voice-activity-detection voice-conversion noise

2033

250

1 年前

ricky0123 / vad

Voice activity detector (VAD) for the browser with a simple API

onnxruntime silero-vad speech-to-text TypeScript voice-activity-detection Web web-audio-api

TypeScript

1619

228

11 天前

k2-fsa / sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

Python speech-recognition C++asr C C#Go Kotlin vad voice-activity-detection

C++

1501

196

18 天前

TEN-framework / ten-vad

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

conversational-ai real-time speech-processing vad voice-activity-detection voice-commands voice-recognition audio automatic-speech-recognition speech silero-vad voice-agent

1475

122

19 天前

juanmc2005 / diart

A python package to build AI-powered real-time audio applications

speaker-diarization streaming-audio real-time 深度学习 transcription voice-activity-detection

Python

1470

107

8 个月前

coqui-ai / open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

tts stt speech-to-text text-to-speech speech-recognition speech-synthesis speech-processing voice-recognition voice-activity-detection voice-cloning speech-separation

1358

148

1 年前

ggeop / Python-ai-assistant

Python AI assistant 🧠

Python voice-recognition voice-assistant voice-control voice-activity-detection voice-chat 自然语言处理 voice-commands 人工智能 scikit-learn nltk google-speech-to-text MongoDB pymongo

Python

991

247

1 年前

iamsrikanthnani / pluely

The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.

ai-assistant claude desktop-app gemini grok 大语言模型 openai React Rust shadcn speech-to-text stealth-game Tailwind CSS Tauri TypeScript undetectable voice-activity-detection

TypeScript

884

129

18 小时前

jtkim-kaist / VAD

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

vad dnn lstm attention speech data voice-detection speech-recognition voice-activity-detection

MATLAB

864

235

4 年前

ina-foss / inaSpeechSegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

audio-analysis speech music voice-activity-detection noise segmentation Transgender

Python

839

142

16 天前

amsehili / auditok

An audio/acoustic activity detection and audio segmentation tool

voice-detection vad voice-activity-detection

Python

818

10 个月前

FluidInference / FluidAudio

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

coreml iOS macOS speaker-diarization speaker-identification speaker-recognition Swift audio real-time vad voice-activity-detection asr automatic-speech-recognition speech-to-text ane Nvidia

Swift

719

1 天前

baxtree / subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

subtitles captions alignment subrip voice-activity-detection tmp transcription

Python

485

2 个月前