Repository navigation

#

mllm

SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python
3827
1 个月前
Python
3548
3 个月前

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

Python
3534
22 天前

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python
2237
3 个月前

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

1558
3 个月前

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python
1222
21 天前
Python
1029
9 个月前

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Python
853
11 天前

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python
828
17 小时前

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

Python
770
3 个月前

🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

738
18 天前

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Python
639
8 个月前

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

JavaScript
606
12 天前

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python
579
1 年前