Repository navigation

#

text-to-audio

open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Python
9419
4 个月前

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python
1877
10 天前

HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.

Python
1199
6 天前
declare-lab/tango
Python
1196
2 个月前

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

Python
1044
16 天前

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching

Jupyter Notebook
787
2 个月前

PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

Python
655
1 年前

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

Python
550
3 年前

Mustango: Toward Controllable Text-to-Music Generation

Python
375
4 个月前

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python
310
3 个月前

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Jupyter Notebook
280
14 天前

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

Jupyter Notebook
187
2 年前

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Python
119
4 年前

Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

Python
118
2 年前
Python
97
6 个月前