Repository navigation

#

vad

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python
12897
4 天前
Python
2689
10 个月前

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

C++
1501
18 天前

Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.

MATLAB
864
4 年前

An audio/acoustic activity detection and audio segmentation tool

Python
818
10 个月前

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift
719
1 天前

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Python
504
5 个月前

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook
470
1 年前

Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.

C
414
3 个月前

Runtime Audio Importer plugin for Unreal Engine. Importing audio of various formats at runtime.

C++
395
7 个月前

集成Webrtc的VAD,用于切分音频文件

C
343
5 年前

Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts

Python
343
1 年前

On-device voice activity detection (VAD) powered by deep learning

Python
230
9 天前

Python bindings of WebRTC Audio Processing

C++
197
5 个月前

A statistical model-based Voice Activity Detection

Jupyter Notebook
194
7 年前

This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming architecture for fluid conversations with immediate responses and natural interruption handling.

Python
194
6 个月前