Repository navigation
text-to-audio
- Website
- Wikipedia
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
A family of diffusion models for text-to-audio generation.
A webui for different audio related Neural Networks
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
OpenMusic: SOTA Text-to-music (TTM) Generation
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Mustango: Toward Controllable Text-to-Music Generation
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
Pytorch implementation of SoundCTM
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
Creative Text-to-Audio Generation via Synthesizer Programming @ ICML'24
AudioLDM text to audio colab