Repository navigation

#

visual-language-learning

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python
22263
8 个月前
Python
3485
6 个月前
EvolvingLMMs-Lab/Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python
3248
1 年前

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Python
276
7 个月前

(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

Python
257
1 年前

🧘🏻‍♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.

Python
88
1 年前

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

Python
34
10 个月前

[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

Python
11
6 个月前

PyTorch implementation of OpenAI's CLIP model for image classification, visual search, and visual question answering (VQA).

Jupyter Notebook
2
7 个月前