Repository navigation
evals
- Website
- Wikipedia
The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.
AI Observability & Evaluation
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite
Test your LLM-powered apps with TypeScript. No API key required.
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).
Evalica, your favourite evaluation toolkit
Benchmarking Large Language Models for FHIR
Go Artificial Intelligence (GAI) helps you work with foundational models, large language models, and other AI models.
An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"
Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025
Root Signals Python SDK
Our curated collection of templates. Use these patterns to set up your AI projects for evaluation with Openlayer.
MCP for Root Signals Evaluation Platform
The OAIEvals Collector: A robust, Go-based metric collector for EVALS data. Supports Kafka, Elastic, Loki, InfluxDB, TimescaleDB integrations, and containerized deployment with Docker. Streamlines OAI-Evals data management efficiently with a low barrier of entry!