Repository navigation

#

sglang

kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++
4049
11 小时前

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech generation.

Python
971
7 天前

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python
811
4 小时前

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

Python
530
5 个月前

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python
413
3 天前

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go
286
12 小时前

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Go
261
19 天前

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

Python
212
10 天前
Python
98
1 天前

A workload for deploying LLM inference services on Kubernetes

Go
75
12 天前

Arks is a cloud-native inference framework running on Kubernetes

Go
43
5 天前

A tool for benchmarking LLMs on Modal

Python
43
1 个月前

A high-performance RDMA distributed storage system for fast LLM Inference and GPU Training

C++
34
17 天前

DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks

Python
16
7 个月前

Kernel Library Wheel for SGLang

HTML
12
3 天前

A guide to structured generation using constrained decoding

Jupyter Notebook
11
1 年前

Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration

Python
11
11 天前

Controllable Language Model Interactions in TypeScript

TypeScript
9
1 年前

Experiments with LLMs in clouds (powered by SGLang)

Python
6
1 个月前