Repository navigation

#

evals

The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.

TypeScript
12111
10 小时前

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI

Python
4230
15 小时前
Kiln-AI/Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

Python
3390
14 小时前

Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.

TypeScript
1861
6 小时前

Test your LLM-powered apps with TypeScript. No API key required.

TypeScript
528
1 天前

[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding

Jupyter Notebook
146
1 个月前

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

TypeScript
89
1 天前

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

Python
34
8 个月前

Benchmarking Large Language Models for FHIR

29
5 个月前
Jupyter Notebook
22
9 天前

Go Artificial Intelligence (GAI) helps you work with foundational models, large language models, and other AI models.

Go
18
19 天前

An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"

Python
16
13 天前

Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025

Jupyter Notebook
15
16 天前

Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.

Python
8
2 个月前

MCP for Root Signals Evaluation Platform

Python
5
2 天前

The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!

Go
3
1 年前