Repository navigation

#

llm-evaluation-metrics

A one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE and is an easily extensible framework for new datasets, evaluations, methods, and other benchmarks.

Python
218
2 天前

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Python
36
9 个月前

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Jupyter Notebook
7
3 个月前

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

Python
0
8 个月前