Repository navigation

#

speaker-recognition

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python
15805
3 小时前

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook
8422
15 小时前
google/uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Python
1585
1 年前

In defence of metric learning for speaker recognition

Python
1133
2 年前

This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the same time, this project also supports MelSpectrogram, Spectrogram data preprocessing methods

Python
1123
4 个月前

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python
1042
18 天前
C++
961
3 年前

🔈 Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Python
790
6 年前

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

Python
736
1 年前

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Swift
719
3 小时前

speaker diarization by uis-rnn and speaker embedding by vgg-speaker-recognition

Python
491
4 年前

This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.

Python
431
2 个月前

Aims to create a comprehensive voice toolkit for training, testing, and deploying speaker verification systems.

Python
395
1 年前

The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.

HTML
371
4 个月前

A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.

Tcl
335
2 年前

Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196

Python
320
5 年前