Repository navigation

text-to-audio

Website
Wikipedia

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

audio-generation audio-synthesis audioldm music-generation naturalspeech2 singing-voice-conversion speech-synthesis text-to-audio text-to-speech vall-e voice-conversion audit fastspeech2 vits emilia maskgct vocoder

Python

8950

700

7 天前

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

audio audio-synthesis 机器视觉深度学习 text-to-audio

Python

1316

166

6 天前

declare-lab / tango

A family of diffusion models for text-to-audio generation.

audio-generation diffusion diffusion-models language-models large-language-models text-to-audio

Python

1160

4 个月前

gitmylo / audio-webui

A webui for different audio related Neural Networks

人工智能 audioldm bark rvc text-to-audio text-to-speech voice-cloning audiocraft music generative-music tts aio all-in-one

Python

1154

106

8 个月前

ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

speech speech-recognition speech-synthesis speech-to-text speech-translation translation all-in-one machine-translation streaming-audio text-to-speech asr tts voice text-to-audio non-autoregressive speech-enhancement audio-processing speech-processing

Python

1058

8 个月前

declare-lab / TangoFlux

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

generative-ai text-to-audio

Jupyter Notebook

708

2 个月前

Text-to-Audio / Make-An-Audio

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

diffusion-models latent-diffusion latent-space text-to-audio

Python

646

1 年前

ivcylc / OpenMusic

OpenMusic: SOTA Text-to-music (TTM) Generation

人工智能 diffusion-models music-generation text-to-audio ai-music audioldm diffusion-transformer dit hifi-gan vall-e

Python

552

2 个月前

lucidrains / nuwa-pytorch

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

人工智能深度学习 transformers attention-mechanism text-to-video text-to-audio

Python

547

2 年前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML

459

15 天前

AMAAI-Lab / mustango

Mustango: Toward Controllable Text-to-Music Generation

diffusion-models large-language-models text-to-audio

Python

358

1 个月前

haidog-yaqub / EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

diffusion-models generative-ai text-to-audio

Python

263

4 天前

happylittlecat2333 / Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

audio-generation diffusion diffusion-models large-language-models text-to-audio

Jupyter Notebook

181

1 年前

ilaria-manco / word2wave

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

text-to-audio audio-generation music-generation ai-music

Python

119

3 年前

bnsantoso / sub-to-audio

Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

text-to-audio text-to-speech Python tts audio-processing

Python

114

1 年前

sony / soundctm

Pytorch implementation of SoundCTM

audio-generation diffusion-models PyTorch text-to-audio

Python

20 天前

keonlee9420 / WaveGrad2

PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

text-to-speech neural-tts audio synthesis non-autoregressive score-matching duration robust PyTorch tts speech-synthesis text-to-audio end-to-end

Python

4 年前

RhythrosaLabs / soundstorm

Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.

algorithmic-composition audio-processing chat-gpt 聊天机器人 ChatGPT gpt gpt-4 MIDI sound sound-processing text-to-audio

Python

1 年前