Repository navigation
multimodality
- Website
- Wikipedia
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
A Comparative Framework for Multimodal Recommender Systems
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Automated modeling and machine learning framework FEDOT
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
A knowledge base construction engine for richly formatted data
Sequence-to-Sequence Framework in PyTorch
Towards Generalist Biomedical AI
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
DANCE: a deep learning library and benchmark platform for single-cell analysis