Repository navigation

rnnt

Website
Wikipedia

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

conformer PyTorch speech-recognition paraformer punctuation speaker-diarization rnnt audio-visual-speech-recognition pretrained-model voice-activity-detection Whisper dfsmn vad speechgpt speechllm

Python

12116

1223

5 天前

upskyy / Transformer-Transducer

PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)

rnnt transformer sequence-to-sequence end-to-end speech-recognition

Python

108

3 年前

stevenhillis / awesome-asr-contextualization

A curated list of awesome papers on contextualizing E2E ASR outputs

Awesome Lists asr speech-recognition speech-processing transducers error-correction rnnt

2 年前

iamjanvijay / rnnt_decoder_cuda

An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.

CUDA rnnt beam-search speech-recognition speech-to-text handwriting-recognition

Cuda

5 年前

iamjanvijay / rnnt

An implementation of RNN-Transducer loss in TF-2.0.

rnnt

Python

2 年前

manhph2211 / ViSTT

I'm building an end-to-end Vietnamese Speech Recognition System. I'll deploy it into production with the help of Flask, Uwsgi, Nginx, and AWS ...

rnnt speech-to-text sst Flask nginx PyTorch uwsgi pytorch-lightning hydra conformer

Python

3 年前

tuanio / conformer-rnnt

Conformer RNN-Transducer

conformer Python rnnt speech-recognition

Python

3 年前

aidayang / FunASR-OneClick

FunASR实时语音识别版，识别麦克风和电脑内播放的声音，电脑语音打字软件

audio-visual-speech-recognition conformer dfsmn paraformer pretrained-models punctuation PyTorch rnnt speaker-diarization speech-recognition speechgpt speechllm vad voice-activity-detection Whisper

3 个月前

George0828Zhang / ssnt_loss

Pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction" https://arxiv.org/abs/1609.08194

Python PyTorch rnnt

Python

3 年前

Andersonjesusvital / Speech-Recognition-RNN

Deep learning-based subtitle generation model that processes audio datasets to generate accurate text transcriptions. Includes audio feature extraction, encoder-decoder architecture, training pipelines, and evaluation metrics for subtitle alignment.

深度神经网络 dnn gru lstm recurrent-neural-networks rnn rnnt sequence-to-sequence speech-to-text Tensorflow

2 天前