Repository navigation

#

vllm

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

Jupyter Notebook
17922
1 天前

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

Python
8595
4 天前

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python
8060
12 天前

Supercharge Your LLM with the Fastest KV Cache Layer

Python
5474
2 天前

Structured data extraction and instruction calling with ML, LLM and Vision LLM

Python
5004
5 天前
xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python
4580
2 个月前
kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++
4049
10 小时前

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

Python
3522
5 天前

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python
3284
9 天前

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

Python
2189
16 小时前

UltraRAG 2.0: Less Code, Lower Barrier, Faster Deployment! MCP-based low-code RAG framework, enabling researchers to build complex pipelines to creative innovation.

Python
1679
21 小时前

Model swapping for llama.cpp (or any local OpenAI API compatible server)

Go
1628
6 天前

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型 基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答,具备处理 CSV 文件 📂 表格问答的能力。同时,能方便对接第三方开源 RAG 系统 检索系统 🌐等,以支持广泛的通用知识问答。

JavaScript
1228
2 天前

Community maintained hardware plugin for vLLM on Ascend

Python
1178
4 天前

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

Go
1090
9 个月前

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Go
1075
1 天前

Evaluate your LLM's response with Prometheus and GPT4 💯

Python
996
5 个月前

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

Python
827
19 天前