Repository navigation

#

multimodality

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

Python
2573
4 年前

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Python
2250
9 个月前

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

1710
1 年前

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python
1156
1 天前

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python
976
1 年前

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

TeX
758
2 年前

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Python
639
8 个月前

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Python
540
3 个月前

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

Python
536
2 个月前

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Python
483
3 年前

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

Python
462
4 年前

A knowledge base construction engine for richly formatted data

Python
411
4 年前
Python
374
12 小时前

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Python
363
2 年前