Repository navigation

#

mllm

Python
21088
2 个月前
Python
3485
6 个月前

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

Python
3298
4 天前

SpatialLM: Large Language Model for Spatial Understanding

Python
3047
22 天前

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python
2157
4 个月前

Pioneering Multimodal Reasoning with CoT

Python
2101
10 天前

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

1454
5 天前

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python
1052
4 天前
Python
1014
5 个月前

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Python
816
2 个月前

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Python
661
21 小时前

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

660
5 天前

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Python
634
4 个月前

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

Python
615
5 天前

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python
559
10 个月前