Repository navigation

#

language-model-agent

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

Python
367
2 个月前

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Python
202
1 个月前

A data construction and evaluation framework to quantify privacy norm awareness of language models (LMs) and emerging privacy risk of LM agents. (NeurIPS 2024 D&B)

Python
29
6 个月前