Repository navigation

#

multi-modality

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
23331
1 年前

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

Python
4353
3 年前
EvolvingLMMs-Lab/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python
3266
1 年前
Python
585
7 天前

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Python
533
1 年前

[CVPR 2025] MINIMA: Modality Invariant Image Matching

Python
470
2 个月前

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

Python
459
1 天前

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Python
439
1 年前

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

Python
427
3 个月前

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

Python
379
1 年前

Official repository for VisionZip (CVPR 2025)

Python
337
1 个月前

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Python
287
1 年前