Repository navigation
grounding
- Website
- Wikipedia
Agent S: an open agentic framework that uses computers like a human
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
awesome grounding: A curated list of research papers in visual grounding
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
CLIPort: What and Where Pathways for Robotic Manipulation
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
We perform functional grounding of LLMs' knowledge in BabyAI-Text
[TPAMI reviewing] Towards Visual Grounding: A Survey
Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)
Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)
Hierarchical Universal Language Conditioned Policies
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Code for CVPR'18 "Grounding Referring Expressions in Images by Variational Context"