Repository navigation

#

grounding

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Python
2250
9 个月前

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Python
652
1 个月前

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python
579
1 年前

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"

Python
468
1 年前

We perform functional grounding of LLMs' knowledge in BabyAI-Text

Python
268
1 年前

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python
257
15 天前

[TPAMI 2025] Towards Visual Grounding: A Survey

Shell
212
15 小时前

UI-Venus is a native UI agent based on the Qwen2.5-VL multimodal large language model, designed to perform precise GUI element grounding and effective navigation using only screenshots as input.

Python
138
19 小时前

[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.

Jupyter Notebook
130
5 天前

Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)

Python
75
10 个月前

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)

Python
71
5 年前

[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

Python
69
5 年前

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Python
61
4 年前

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.

Python
54
5 天前

Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"

Python
52
1 个月前

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Python
52
2 年前