Repository navigation

#

visual-instruction-tuning

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python
816
2 个月前

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python
453
3 个月前

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python
289
2 个月前

A collection of visual instruction tuning datasets.

Python
76
1 年前

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)

Python
64
1 年前

Gamified Adversarial Prompting (GAP): Crowdsourcing AI-weakness-targeting data through gamification. Boost model performance with community-driven, strategic data collection

Python
26
6 个月前

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Python
16
9 个月前

Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey

5
1 年前