Repository navigation

vllm

Website
Wikipedia

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

人工智能 finetuning langchain llama llama2 大语言模型机器学习 Python PyTorch vllm

Jupyter Notebook

17922

2624

1 天前

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

ggml PyTorch chatglm 部署 flan-t5 大语言模型 wizardlm 人工智能机器学习 Whisper inference openai-api mistral gemma llama llamacpp vllm qwen llama3 glm4

Python

8595

744

4 天前

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

transformers vllm large-language-models raylib reinforcement-learning-from-human-feedback reinforcement-learning openai-o1 proximal-policy-optimization

Python

8060

786

12 天前

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

amd CUDA inference kv-cache 大语言模型 PyTorch rocm vllm fast speed

Python

5474

622

2 天前

katanaml / sparrow

Structured data extraction and instruction calling with ML, LLM and Vision LLM

机器学习 huggingface-transformers 自然语言处理机器视觉 gpt 大语言模型 rag vllm

Python

5004

500

5 天前

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

flash-attention tensorrt-llm vllm llm-inference deepseek deepseek-v3 deepseek-r1 qwen3

Python

4580

309

2 个月前

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference kvcache 大语言模型 rdma sglang vllm disaggregation

C++

4049

387

10 小时前

gpustack / gpustack

Simple, scalable AI model deployment on GPU clusters

ascend CUDA deepseek distributed-inference genai inference llama llamacpp 大语言模型 maas metal openai qwen rocm vllm mindie llm-inference llm-serving local-ai heterogeneous-cluster

Python

3800

383

7 天前

PaddlePaddle / FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

serving ernie 大语言模型 inference llm-serving openai vllm ernie-45 ernie-45-vl

Python

3522

634

5 天前

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

course 大语言模型 Python qwen qwen2 serving vllm

Python

3284

216

9 天前

containers / ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

人工智能 containers CUDA hip inference-server intel llamacpp 大语言模型 podman vllm

Python

2189

262

16 小时前

OpenBMB / UltraRAG

UltraRAG 2.0: Less Code, Lower Barrier, Faster Deployment! MCP-based low-code RAG framework, enabling researchers to build complex pipelines to creative innovation.

embedding 大语言模型 rag easy mcp mcp-client mcp-server openai vllm gpt qwen deepseek jina sentence-transformers

Python

1679

138

21 小时前

mostlygeek / llama-swap

Model swapping for llama.cpp (or any local OpenAI API compatible server)

Go llama llamacpp localllama localllm openai openai-api vllm

1628

105

6 天前

vllm-project / semantic-router

Intelligent Mixture-of-Models Router for Efficient LLM Inference

Go huggingface-transformers pii-detection Python Rust vllm ai-gateway envoyproxy fine-tuning Kubernetes prompt-engineering

1602

179

10 小时前

apconw / sanic-web

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答，具备处理 CSV 文件 📂 表格问答的能力。同时，能方便对接第三方开源 RAG 系统检索系统 🌐等，以支持广泛的通用知识问答。

bigdata dify ollama vllm 大语言模型 qwen echarts sanic text2sql Vue.js Python deepseek-r1 mcp langchain Neo4j

JavaScript

1228

219

2 天前

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

ascend inference 大语言模型 llm-serving llmops mlops model-serving transformer vllm

Python

1178

468

4 天前

bricks-cloud / BricksLLM

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.