Repository navigation

#

wavlm

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python
5656
8 个月前
s3prl/s3prl
Python
2372
1 个月前

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python
879
2 个月前

A low-bitrate single-codebook 16 kHz speech codec based on focal modulation

Python
84
2 个月前

This repo contains the source code of the first deep learning-base singing voice beat tracking system. It leverages WavLM and DistilHuBERT pre-trained speech models to create vocal embeddings and trains linear multi-head self-attention layers on top of them to extract vocal beat activations. Then, it uses HMM decoder to infer signing beats and tempo.

Python
30
3 年前

A neural speech codec based on discrete WavLM representations

Python
23
8 个月前

In this repository, the wavLM model is used for quality and poor quality data for speaker verification task, and the PyCM library is used for evaluation.

Jupyter Notebook
8
2 年前

This repository contain the code of the main part of my master thesis degree at Politecnico di Torino in Data science & Engineering

Python
8
2 年前

SOTA method for self-supervised speaker verification leveraging a large-scale pretrained ASR model.

Python
7
2 个月前

This repository combines `WavLM`, a powerful speech representation model from Microsoft, with `MSDD` (Multi-Scale Diarization Decoder), a state-of-the-art approach for speaker diarization from Nvidia.

Jupyter Notebook
7
1 个月前

Universal Pooling Method for Speaker Verification Utilizing Pre-trained Multi-layer Features, 2025 preprint

Python
5
7 个月前

This repo contains code used in the paper "Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection"

Python
4
2 年前

WavLM Large + RawNetX Speaker Verification Base: End-to-End Speaker Verification Architecture

Python
3
1 个月前