Repository navigation

#

grounding

Website
Wikipedia

simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

agent-computer-interface ai-agents computer-automation gui-agents memory mllm planning retrieval-augmented-generation in-context-reinforcement-learning computer-use grounding

Python

2355

251

2 天前

BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python

2073

184

5 个月前

TheShadow29 / awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

机器视觉自然语言处理 grounding Awesome Lists papers arxiv video-understanding captioning-videos embodied-agent multimodal-deep-learning language-grounding Bukkit

1069

100

2 年前

FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding 大语言模型 mllm large-language-models foundation-models llama llama2 multimodal vision-language-model

Python

560

44

10 个月前

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

自然语言处理 Robotics 深度学习 grounding vision-language manipulation 机器视觉 PyTorch vision vision-and-language

Python

538

73

2 个月前

cliport / cliport

CLIPort: What and Where Pathways for Robotic Manipulation

clip Robotics vision 深度学习自然语言处理 grounding vision-language manipulation PyTorch rearrangement 机器视觉

Jupyter Notebook

490

88

1 年前

allenai / lumos

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"

decision-making grounding maths planning question-answering reasoning web-agent

Python

463

29

1 年前

mbzuai-oryx / Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

大语言模型 lmm Video grounding transcription

Python

256

12

1 年前

flowersteam / Grounding_LLMs_with_online_RL

We perform functional grounding of LLMs' knowledge in BabyAI-Text

grounding language-model reinforcement-learning

Python

254

30

8 个月前

linhuixiao / Awesome-Visual-Grounding

[TPAMI reviewing] Towards Visual Grounding: A Survey

grounding Awesome Lists survey

Shell

138

17

1 个月前

linhuixiao / CLIP-VG

[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.

Jupyter Notebook

120

8

3 个月前

TIGER-AI-Lab / StructLM

Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)

grounding 大语言模型 reasoning

Python

76

9

6 个月前

TheShadow29 / zsgnet-pytorch

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)

grounding vision 自然语言处理 objects

Python

71

12

5 年前

lukashermann / hulc

Hierarchical Universal Language Conditioned Policies

机器视觉深度学习 grounding manipulation 自然语言处理 PyTorch Robotics vision vision-and-language vision-language

Python

71

9

1 年前

TheShadow29 / vognet-pytorch

[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

grounding Video pytorch-implementation vision vision-and-language 自然语言处理 captioning-videos

Python

67

7

5 年前

TheShadow29 / VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

vision vision-and-language grounding 自然语言处理 Video srl captioning-videos captioning

Python

59

8

4 年前

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

pretrained-language-model PyTorch transformer zero-shot-learning cross-modal grounding semantic

Python

50

8

1 年前

linhuixiao / HiVG

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.

Python

45

4

4 天前

[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data

机器视觉深度学习 grounding manipulation 自然语言处理 PyTorch Robotics vision vision-and-language vision-language

Python

42

4

1 年前

Code for CVPR'18 "Grounding Referring Expressions in Images by Variational Context"

cvpr2018 Tensorflow grounding

Python

30

2

7 年前