Repository navigation
audio-generation
- Website
- Wikipedia
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
AudioLDM: Generate speech, sound effects, music and beyond, with text.
TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)
Audio generation using diffusion models, in PyTorch.
A timeline of the latest AI models for audio generation, starting in 2023!
Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch
A family of diffusion models for text-to-audio generation.
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Official PyTorch implementation of BigVGAN (ICLR 2023)
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
This is a list of sound, audio and music development tools which contains machine learning, audio generation, audio signal processing, sound synthesis, spatial audio, music information retrieval, music generation, speech recognition, speech synthesis, singing voice synthesis and more.
Python library for designing and training your own Diffusion Models with PyTorch.