Repository navigation

#

llm-evaluation-metrics

The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: benchmarks, methods, evaluations, models etc. are easily extensible.

Python
381
2 个月前

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Python
37
1 年前

Run a prompt against all, or some, of your models running on Ollama. Creates web pages with the output, performance statistics and model info. All in a single Bash shell script.

Shell
11
1 个月前

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

Jupyter Notebook
7
5 个月前

Measure of estimated confidence for non-hallucinative nature of outputs generated by Large Language Models.

Python
6
2 个月前

In this we evaluate the LLM responses and find accuracy

Python
1
3 个月前
Python
1
1 个月前

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

Python
0
1 年前