Repository navigation
silero-vad
- Website
- Wikipedia
Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
A sophisticated real-time voice assistant that seamlessly integrates speech recognition, AI reasoning, and neural text-to-speech synthesis. It is designed for natural conversational interactions with advanced tool-calling capabilities.
A real-time Voice Activity Detection (VAD) library for iOS and macOS using Silero models powered by ONNX Runtime. Includes advanced noise suppression and audio preprocessing with WebRTC APM, supporting seamless WAV data output with header metadata.
In this repository, I show you how to use SILERO VAD with ONNX-WEB runtime to run the VAD compeletely in the browser.
iOS Voice Activity Detection (VAD). Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
VAD is a cross-platform Dart binding for the VAD JavaScript library. This package provides access to a Voice Activity Detection (VAD) system, allowing Flutter applications to start and stop VAD-based listening and handle various VAD events.
Uses the excellent silero VAD with onnxruntime C api for fast detection of audio segments with speech
Audio transcription using mlx whisper and vad silence processing
Enterprise VAD (Voice Activity Detection) in C#.NET (.NET 6.0+) with Microsoft.ML.Net, ONNXRuntime and DirectML. The easiest, efficient, and performant Silero VAD implementation! Always open for PRs.
Python script for detect silences with Silero-VAD and transcribing with the whisper AI model.
This repo provides an addon that can perform VAD model reasoning in nodes and electric environments, based on cmake-js and Fastdeploy. Silero VAD is a pre-trained enterprise-grade Voice Activity Detector.
C++ implementation of real-time Voice Activity Detection (VAD) using Silero models with ONNX Runtime and WebRTC Audio Processing. Provides precise voice segmentation and cross-platform XCFramework support.
Experimental voice user interface (VUI) to interact with an agentic AI assistant
Test comparison of two VAD models with English and multilingual speech datasets
A voice assistant with local LLM as a backend
Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client to server for live transcription and optional translation. Supports CLI and Python API.
Youtube Text Live Streaming in CLI
Deplay Whisper on AWS Scalably