Repository navigation

#

voice-activity-detection

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python
12898
4 天前

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook
8422
15 小时前

Voice activity detector (VAD) for the browser with a simple API

TypeScript
1619
11 天前

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

C++
1501
18 天前
juanmc2005/diart

A python package to build AI-powered real-time audio applications

Python
1470
8 个月前

The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.

TypeScript
884
18 小时前

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

MATLAB
864
4 年前

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

Python
839
16 天前

An audio/acoustic activity detection and audio segmentation tool

Python
818
10 个月前

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift
719
1 天前

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

Python
485
2 个月前

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook
470
1 年前

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

C
414
3 个月前