Repository navigation

#

evaluation

mlflow/mlflow

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

Python
22334
11 小时前
langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript
16765
2 天前

Supercharge Your LLM Application Evaluations 🚀

Python
10987
1 天前

Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

TypeScript
8584
13 分钟前

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

Python
8502
5 天前

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python
6125
6 天前

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.

Go
4969
4 天前
Marker-Inc-Korea/AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

Python
4332
7 天前
MichaelGrupp/evo
Python
3961
7 小时前

Arbitrary expression evaluation for golang

Go
3904
6 个月前

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3258
1 个月前

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.

HTML
3137
1 年前

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python
3134
2 天前

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python
3122
7 天前