Repository navigation
sglang
- Website
- Wikipedia
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech generation.
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
A workload for deploying LLM inference services on Kubernetes
Arks is a cloud-native inference framework running on Kubernetes
A tool for benchmarking LLMs on Modal
A high-performance RDMA distributed storage system for fast LLM Inference and GPU Training
DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks
A guide to structured generation using constrained decoding
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
Controllable Language Model Interactions in TypeScript
Experiments with LLMs in clouds (powered by SGLang)