Repository navigation

large-vision-language-models

Website
Wikipedia

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuning instruction-following large-vision-language-model visual-instruction-tuning multi-modality in-context-learning large-language-models large-vision-language-models multimodal-chain-of-thought multimodal-in-context-learning multimodal-large-language-models chain-of-thought

16387

1063

11 天前

ShareGPT4Omni / ShareGPT4Video

[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"

ChatGPT gpt gpt-4v large-language-models large-multimodal-models large-vision-language-models sora text-to-video

Python

1076

1 年前

zhaochen0110 / Awesome_Think_With_Images

Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.

large-vision-language-models

999

5 天前

NVlabs / DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

commonsense-reasoning 深度学习深度神经网络 instruction-tuning large-language-models large-vision-language-models lora parameter-efficient-fine-tuning parameter-efficient-tuning vision-and-language

Python

862

1 年前

MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

large-language-models large-vision-language-models mme multimodal-large-language-models Video

653

1 个月前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML

511

6 个月前

Paranioar / Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

cross-modal-retrieval 教程 Awesome Lists image-text-matching image-text-retrieval large-language-models large-vision-language-models multimodal-pretraining parameter-efficient-fine-tuning vision-and-language multimodal-large-language-models 大语言模型 text-to-image-generation text-to-image-synthesis text-to-video-generation

430

9 天前

burglarhobbit / Awesome-Medical-Large-Language-Models

Curated papers on Large Language Models in Healthcare and Medical domain

large-language-models multimodal-large-language-models large-vision-language-models

358

4 个月前

tianyi-lab / HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination 大语言模型 lmm large-language-models large-vision-language-models

Python

300

1 年前

ShareGPT4Omni / ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ChatGPT gpt gpt-4v gpt4v instruction-tuning language-model large-language-models large-multimodal-models large-vision-language-models vision-language-model eccv2024

Python

236

1 年前

khuangaf / Awesome-Chart-Understanding

A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

Awesome Lists chart-understanding large-vision-language-models

222

4 个月前

MMStar-Benchmark / MMStar

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation large-language-models large-multimodal-models large-vision-language-model large-vision-language-models 大语言模型 multimodal multimodal-learning multimodality visual-question-answering

Python

196

1 年前

NishilBalar / Awesome-LVLM-Hallucination

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

hallucination large-vision-language-models multimodal-large-language-models large-language-models 大语言模型 mllm

179

1 天前

itsqyh / Awesome-LMMs-Mechanistic-Interpretability

A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository aggregates surveys, blog posts, and research papers that explore how LMMs represent, transform, and align multimodal information internally.

large-multimodal-models large-vision-language-models large-language-models generative generative-model

139

2 个月前

mbzuai-oryx / GeoPixel

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.

foundation-models large-multimodal-models large-vision-language-models remote-sensing segmentation-models

Python

121

4 个月前

llmbev / talk2bev

Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

autonomous-driving gpt-4 large-language-models large-vision-language-models occupancy-grid-map

Python

113

1 年前

yu-rp / apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

large-multimodal-models large-vision-language-model large-vision-language-models prompting vision-language-model visual-prompting

Python

103

1 年前

hiyamdebary / EarthDial

[CVPR 2025 🔥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.

foundation-models large-multimodal-models large-vision-language-models remote-sensing

Python

3 个月前

yfzhang114 / LLaVA-Align

[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.

hallucination large-vision-language-models

Python

7 个月前

ys-zong / VLGuard

[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.

alignment large-language-models large-vision-language-models safety vision-language-model

Python

9 个月前