Repository navigation

#

multi-modality

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
23657
1 年前

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

Python
4344
4 年前
EvolvingLMMs-Lab/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python
3270
2 年前

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python
625
10 天前

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Python
539
1 年前

[CVPR 2025] MINIMA: Modality Invariant Image Matching

Python
492
6 天前

The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

Python
458
5 天前

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Python
441
1 年前

[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

Python
427
4 个月前

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

Python
382
1 年前

Official repository for VisionZip (CVPR 2025)

Python
352
2 个月前

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Python
294
1 年前