Repository navigation

evaluation-metrics

Website
Wikipedia

confident-ai / deepeval

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Python

10237

883

6 小时前

AgentOps-AI / agentops

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI

agent agentops 人工智能 evals evaluation-metrics 大语言模型 anthropic autogen cost-estimation crewai groq langchain mistral ollama openai agents-sdk openai-agents

Python

4795

459

2 小时前

datawhalechina / tiny-universe

《大模型白盒子构建指南》：一个全手搓的Tiny-Universe

rag agent diffusion evaluation-metrics llama qwen transformers

Jupyter Notebook

3564

361

2 天前

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

evaluation evaluation-framework evaluation-metrics huggingface

Python

1826

323

10 小时前

xinshuoweng / AB3DMOT

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

机器视觉机器学习 Robotics tracking 3d-tracking multi-object-tracking real-time evaluation-metrics evaluation 3d-multi-object-tracking kitti

Python

1773

407

1 年前

huggingface / evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

evaluation evaluation-metrics guidebook large-language-models 大语言模型机器学习教程

Jupyter Notebook

1540

7 个月前

google-research / rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

reinforcement-learning benchmarking evaluation-metrics 机器学习 Google rl

Jupyter Notebook

836

1 年前

MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

topic-modeling evaluation-metrics 自然语言处理 bayesian-optimization hyperparameter-optimization hyperparameter-tuning hyperparameter-search topic-models nlproc nlp-library

Python

782

115

1 年前

jitsi / jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

automatic-speech-recognition Python speech-to-text evaluation-metrics

Python

775

105

6 个月前

Unbabel / COMET

A Neural Framework for MT Evaluation

machine-translation evaluation-metrics 自然语言处理机器学习人工智能

Python

645

14 天前

nekhtiari / image-similarity-measures

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

Image 监控机器学习 evaluation-metrics p1 processing

Python

622

1 年前

AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

numba Python evaluation evaluation-metrics information-retrieval recommender-systems metasearch comparison

Python

582

13 天前

relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications

evaluation-framework evaluation-metrics information-retrieval llm-evaluation llmops rag retrieval-augmented-generation

Python

503

7 个月前

proycon / pynlpl

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

自然语言处理 Python computational-linguistics Library folia 机器学习 search-algorithms evaluation-metrics text-processing nlp-library

Python

476

2 年前