Repository navigation

llm-evaluation-metrics

Website
Wikipedia

confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Python

6013

524

3 小时前

locuslab / open-unlearning

A one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE and is an easily extensible framework for new datasets, evaluations, methods, and other benchmarks.

privacy-protection benchmarks llm-evaluation-metrics llms Open Source

Python

218

2 天前

cvs-health / langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

人工智能 bias bias-detection fairness fairness-ai fairness-ml fairness-testing large-language-models 大语言模型 responsible-ai Python ai-safety llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Python

201

21 小时前

zhuohaoyu / KIEval

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

explainable-ai 大语言模型 llm-evaluation llm-evaluation-framework llm-evaluation-metrics 机器学习

Python

9 个月前

pyladiesams / eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

大语言模型 llmops llms workshop llm-eval llm-evaluation-framework llm-evaluation-metrics llm-monitoring

Jupyter Notebook

3 个月前

ritwickbhargav80 / quick-llm-model-evaluations

This repo is for an streamlit application that provides a user-friendly interface for evaluating large language models (LLMs) using the beyondllm package.

llm-evaluation-metrics llms retrieval-augmented-generation Streamlit

Python

8 个月前