Repository navigation

mllm

Website
Wikipedia

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

自然语言处理 pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3 foundation-models xlm-e deepnet 大语言模型 multimodal mllm kosmos kosmos-1 textdiffuser bitnet

Python

21754

2662

3 个月前

simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

agent-computer-interface ai-agents computer-automation gui-agents memory mllm planning retrieval-augmented-generation in-context-reinforcement-learning computer-use grounding computer-use-agent cua

Python

6520

712

1 小时前

X-PLUG / MobileAgent

Mobile-Agent: The Powerful GUI Agent Family

agent mllm mobile-agents multimodal multimodal-large-language-models multimodal-agent Android App GUI 移动自动化 copilot

Python

5944

577

7 天前

manycore-research / SpatialLM

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

scene-understanding spatial-intelligence mllm point-clouds

Python

4012

314

8 天前

ant-research / MagicQuill

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

aigc image-editing mllm gradio

Python

3605

374

2 个月前

NExT-GPT / NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

ChatGPT foundation-models gpt-4 instruction-tuning large-language-models 大语言模型 multi-modal-chatgpt multimodal visual-language-learning mllm

Python

3565

359

5 个月前

atfortes / Awesome-LLM-Reasoning

From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓

language-models reasoning prompt in-context-learning ChatGPT chain-of-thought prompt-engineering cot Awesome Lists gpt mllm multimodal papers gpt-4o openai-o1 strawberry deepseek deepseek-r1

3370

197

5 个月前

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model 大语言模型 large-vision-language-model vision-transformer gpt

Python

2895

177

4 个月前

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Python

2249

129

4 个月前

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人 clip 机器视觉 dino instruction-tuning large-language-models 大语言模型 mllm multimodal-large-language-models representation-learning

Python

1954

131

1 年前

coderonion / awesome-yolo-object-detection

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

yolo yolov5 tensorrt object-detection yolov8 CUDA 大语言模型 llama vlm datasets deepseek GUI mllm qwen

1596

219

4 个月前

bytedance / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

机器视觉 mllm large-language-models

Python

1289

1 个月前

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

mllm ChatGPT gpt-4 multimodal-large-language-models vlm chinese english

Python

1044

1 年前

NVlabs / EAGLE

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Demo gpt4 huggingface llama llama3 llava lmm mllm 大语言模型 large-language-models

Python

876

2 个月前

CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

mllm sam visual-instruction-tuning pixel-understanding

Python

832

2 个月前

taco-group / OpenEMMA

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

算法人工智能 autonomous-driving autonomous-vehicles autonomy generative-ai 机器学习 mllm Network perception transportation

Python

808

5 个月前

coderonion / awesome-llm-and-aigc

🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

gpt 大语言模型 Awesome Lists llama aigc langchain datasets yolo triton CUDA vlm deepseek qwen mllm ai4science reinforcement-learning qwen3

763

2 个月前

VITA-MLLM / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models 大语言模型 mllm multimodal-large-language-models multimodality

Python

638

9 个月前

LYL1015 / JarvisArt

[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

agent 图像处理 large-language-models mllm

JavaScript

635

7 天前

FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding 大语言模型 mllm large-language-models foundation-models llama llama2 multimodal vision-language-model

Python

577

1 年前