Repository navigation

#

vad

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python
12117
5 天前
Python
2609
8 个月前

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

C++
1456
5 天前

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

MATLAB
862
4 年前

An audio/acoustic activity detection and audio segmentation tool

Python
798
8 个月前

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift
525
6 小时前

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Python
499
3 个月前

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook
458
1 年前

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

C
397
1 个月前

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

C++
392
6 个月前

集成Webrtc的VAD,用于切分音频文件

C
343
5 年前

Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts

Python
336
9 个月前

On-device voice activity detection (VAD) powered by deep learning

Python
227
6 天前

Python bindings of WebRTC Audio Processing

C++
193
3 个月前

A statistical model-based Voice Activity Detection

Jupyter Notebook
192
7 年前

This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.

Python
165
4 个月前