Repository navigation

#

voice-synthesis

Machine learning based speech synthesis Electron app, with voices from specific characters from video games

JavaScript
618
1 年前

This repository has implementation for "Neural Voice Cloning With Few Samples"

Python
436
4 年前

Fully reproduce the paper of StarGAN-VC. Stable training and Better audio quality .

Python
247
1 年前

Desktop application for neural speech synthesis written in C++

C++
214
2 年前

A programmable version of Neil Thapen's Pink Trombone

JavaScript
177
3 个月前

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. SV2TTS is a three-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model trained to generalize to new voices.

Python
170
5 年前

auto video translation-video translator can auto translate video hard subtitles, auto video translation and dubbing, remove any video text, auto remove video subtitles/text. 自动视频翻译配音,自动翻译视频字幕和回填样式,自动硬字幕翻译。

Python
151
1 年前

Welcome to the Microsoft Voice Assistant samples repository! Here you will find samples to help you get started building client application for your bot or Custom Command service. You will also be able to easily deploy a working Custom Command based Voice Assistant to your own Azure subscription

C++
115
2 年前

Voice stress analysis (VSA) aims to differentiate between stressed and non-stressed outputs in response to stimuli (e.g., questions posed), with high stress seen as an indication of deception. In this work, we propose a deep learning-based psychological stress detection model using speech signals. With increasing demands for communication between humans and intelligent systems, automatic stress detection is becoming an interesting research topic. Stress can be reliably detected by measuring the level of specific hormones (e.g., cortisol), but this is not a convenient method for the detection of stress in human- machine interactions. The proposed algorithm first extracts Mel- filter bank coefficients using pre-processed speech data and then predicts the status of stress output using a binary decision criterion (i.e., stressed or unstressed) using CNN (Convolutional Neural Network) and dense fully connected layer networks.

Jupyter Notebook
92
4 年前

A non-official Eleven Labs voice synthesis client for Unity (UPM)

C#
91
5 天前

A Non-Official ElevenLabs RESTful API Client for dotnet

C#
64
2 个月前
TypeScript
60
2 年前

💬 "Realtime" voice transcription and cloning using ElevenLabs's API.

TypeScript
53
2 年前