Repository navigation

lmm

Website
Wikipedia

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python

2295

227

1 年前

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

foundation-models lmm vision-and-language vision-language-model llm-agent

Python

918

2 个月前

NVlabs / EAGLE

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

Demo gpt4 huggingface llama llama3 llava lmm mllm 大语言模型 large-language-models

Python

876

2 个月前

LLaVA-VL / LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

lmm multimodal

Python

378

1 年前

tianyi-lab / HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark vlms gpt-4 gpt-4v llava benchmarks hallucination 大语言模型 lmm large-language-models large-vision-language-models

Python

300

1 年前

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025

connector lmm mllm

Python

269

4 个月前

mbzuai-oryx / Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

大语言模型 lmm Video grounding transcription

Python

257

2 个月前

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]

language vision lmm mllm Video vlm multimodal

Python

227

6 个月前

Javis603 / Discord-AIBot

🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手，整合多种顶级 AI 模型，支持多语言、多模态交流、图片生成、联网搜索和深度思考

人工智能聊天机器人 ChatGPT claude deepseek Discord discord-bot Discord.JS gemini 大语言模型 lmm Node.js openai xai

JavaScript

223

7 个月前