Repository navigation

#

large-multimodal-models

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python
2235
23 天前
OpenAdaptAI/OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Python
1239
1 个月前

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Python
1051
6 个月前
Python
799
24 天前

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

Python
739
1 年前

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

Python
453
3 个月前

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Python
331
8 个月前
Python
306
3 个月前

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python
289
2 个月前

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Python
184
1 年前

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Python
175
7 个月前

The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

Python
98
5 个月前

[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Python
85
8 个月前
Python
82
6 个月前