Repository navigation

#

vllm

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

Jupyter Notebook
17761
1 天前

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Python
8404
15 小时前

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python
7698
5 天前

Structured data extraction and instruction calling with ML, LLM and Vision LLM

Python
4957
2 个月前

Supercharge Your LLM with the Fastest KV Cache Layer

Python
4785
5 小时前
xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python
4403
14 小时前
kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++
3782
11 小时前

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

Python
3454
11 小时前

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python
2897
20 小时前

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Python
2051
1 天前

Model swapping for llama.cpp (or any local OpenAPI compatible server)

Go
1341
2 天前

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

Go
1077
7 个月前

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Go
1041
1 天前

Community maintained hardware plugin for vLLM on Ascend

Python
1016
16 小时前

Evaluate your LLM's response with Prometheus and GPT4 💯

Python
979
4 个月前

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型 基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

JavaScript
945
12 小时前

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python
812
10 天前

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python
738
12 小时前

A Python library powered by Language Models (LLMs) for conversational data discovery and analysis.

Python
687
10 天前