Repository navigation

#

multimodality

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

Python
2572
4 年前

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Python
2295
1 年前

Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

1745
1 年前

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python
1364
13 天前

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python
989
1 年前

[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

TeX
753
2 年前

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Python
638
9 个月前

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

Python
551
3 个月前

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Python
547
4 个月前

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Python
484
3 年前

A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

Python
463
4 年前

A knowledge base construction engine for richly formatted data

Python
411
4 年前
Python
377
19 天前

[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“

Python
375
3 个月前