Repository navigation
sglang
- Website
- Wikipedia
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。
kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.
Arks is a cloud-native inference framework running on Kubernetes
A tool for benchmarking LLMs on Modal
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
A guide to structured generation using constrained decoding
Controllable Language Model Interactions in TypeScript
Experiments with LLMs in clouds (powered by SGLang)
The Private AI Setup Dream Guide for Demos automates the installation of the software needed for a local private AI setup, utilizing AI models (LLMs and diffusion models) for use cases such as general assistance, business ideas, coding, image generation, systems administration, marketing, planning, and more.
llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.
Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers & practitioners.