Repository navigation

#

grounding

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

Python
2073
5 个月前

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Python
560
10 个月前

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Python
538
2 个月前

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"

Python
463
1 年前

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python
256
1 年前

We perform functional grounding of LLMs' knowledge in BabyAI-Text

Python
254
8 个月前

[TPAMI reviewing] Towards Visual Grounding: A Survey

Shell
138
1 个月前

[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.

Jupyter Notebook
120
3 个月前

Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)

Python
76
6 个月前

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)

Python
71
5 年前

[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

Python
67
5 年前

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Python
59
4 年前

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Python
50
1 年前

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.

Python
45
4 天前
Python
42
1 年前

Code for CVPR'18 "Grounding Referring Expressions in Images by Variational Context"

Python
30
7 年前