Repository navigation

#

multimodality

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

Python
2571
3 年前

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Python
2073
5 个月前

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

1601
8 个月前

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python
935
1 年前

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python
891
1 个月前

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

TeX
755
1 年前

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Python
634
4 个月前

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Python
526
10 个月前

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

Python
506
1 个月前

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Python
478
2 年前

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

Python
462
3 年前

A knowledge base construction engine for richly formatted data

Python
409
4 年前

An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

Python
360
1 年前
Python
357
11 小时前