Repository navigation

speech-processing

Website
Wikipedia

A PyTorch-based Speech Toolkit

speech-recognition speech-toolkit speaker-recognition speech-to-text speech-enhancement speech-separation audio audio-processing speech-processing speechrecognition asr voice-recognition speaker-diarization speaker-verification PyTorch huggingface transformers language-model 深度学习

Python

10511

1559

9 天前

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

PyTorch speech-processing speaker-diarization voice-activity-detection pretrained-models speaker-recognition speaker-verification

Jupyter Notebook

8422

949

11 小时前

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

voice-detection voice-recognition voice-commands PyTorch onnx voice-activity-detection voice-control onnx-runtime onnxruntime speech speech-processing vad

Python

6996

639

1 个月前

pliang279 / awesome-multimodal-ml

Reading list for research topics in multimodal machine learning

multimodal-learning 机器学习 representation-learning 自然语言处理机器视觉 speech-processing Robotics healthcare reading-list 深度学习 reinforcement-learning

6643

891

1 年前

microsoft / torchscale

Foundation Architecture for (M)LLMs

机器视觉机器学习 multimodal 自然语言处理 pretrained-language-model speech-processing transformer translation

Python

3117

219

1 年前

linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

深度学习 speech speech-recognition speech-to-text asr 机器学习 Python PyTorch attention-is-all-you-need attention-mechanism attention-model speaker-diarization speech-processing transformers Whisper

Python

2615

199

1 个月前

r9y9 / wavenet_vocoder

WaveNet vocoder

wavenet speech-synthesis speech-processing PyTorch Python neural-vocoder speech

Python

2366

496

2 年前

resemble-ai / resemble-enhance

AI powered speech denoising and enhancement

denoise speech-denoising speech-enhancement speech-processing

Python

1991

237

10 个月前

r9y9 / deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

tts speech-synthesis end-to-end speech-processing 机器学习 PyTorch Python multi-speaker

Python

1980

486

2 年前

wq2012 / awesome-diarization

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

speaker-diarization Awesome Lists 机器学习 speech-recognition speech-processing 深度学习

1805

238

2 个月前

DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

text-to-speech toolkit speech-synthesis 深度学习 speech-processing tts PyTorch speech

Python

1644

187

3 个月前

TEN-framework / ten-vad

Voice Activity Detector (VAD) : low-latency, high-performance and lightweight

conversational-ai real-time speech-processing vad voice-activity-detection voice-commands voice-recognition audio automatic-speech-recognition speech silero-vad voice-agent

1474

122

19 天前

coqui-ai / open-speech-corpora

💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies

tts stt speech-to-text text-to-speech speech-recognition speech-synthesis speech-processing voice-recognition voice-activity-detection voice-cloning speech-separation

1358

148

1 年前

haoheliu / voicefixer

General Speech Restoration

speech-processing speech-synthesis speech-enhancement speech-analysis speech tts denoise super-resolution vocoder

Python

1215

147

8 个月前

mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Python

1198

270

4 年前

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

seamless speech speech-recognition speech-synthesis speech-to-text speech-translation translation all-in-one machine-translation streaming-audio text-to-speech asr tts voice text-to-audio non-autoregressive speech-enhancement audio-processing speech-processing

Python

1155

3 个月前

midas-research / audino

Open source audio annotation tool for humans

audio-processing speech-processing 机器学习 annotation-tool audio-annotation Python datasets

JavaScript

1113

137

8 个月前

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

audio-processing 大语言模型 multimodal-large-language-models peft speech-processing

Python

896

1 个月前

Ryuk17 / SpeechAlgorithms

You can find the speech algorithms you want here

speech-processing

832

259

2 个月前

nyrahealth / CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

asr audio detection recognition speech speech-recognition transcription Whisper speech-processing

Python

828

4 个月前