Repository navigation

#

mllm

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python
4012
8 天前

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

Python
3605
2 个月前
Python
3565
5 个月前

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python
2249
4 个月前

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

1596
4 个月前

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python
1289
1 个月前
Python
1044
1 年前

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python
876
2 个月前

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python
832
2 个月前

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

Python
808
5 个月前

🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

763
2 个月前

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Python
638
9 个月前

[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

JavaScript
635
7 天前

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python
577
1 年前