Repository navigation

#

sglang

Website
Wikipedia

kvcache-ai/Mooncake

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference kvcache 大语言模型 rdma sglang vllm disaggregation

C++

4049

387

11 小时前

OpenMOSS / MOSS-TTSD

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech generation.

large-language-models finetune sglang streaming

Python

971

84

7 天前

ModelCloud / GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

gptq peft quantization sglang transformers vllm

Python

811

115

4 小时前

HuiResearch / FlashTTS

基于SparkTTS、OrpheusTTS等模型，提供高质量中文语音合成与声音克隆服务。

Python

530

70

5 个月前

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

PyTorch sglang training 大语言模型

Python

413

86

3 天前

sgl-project / ome

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Kubernetes llm-inference model-serving oracle-cloud sglang 大语言模型 deepseek llama

Go

286

45

12 小时前

InftyAI / llmaz

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Kubernetes 大语言模型 llamacpp sglang vllm huggingface modelscope ollama inference inference-platform

Go

261

43

19 天前

shell-nlp / gpt_server

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR、TTS、文生图、图片编辑和文生视频的开源框架。

embedding gpt llama 大语言模型 openai prompt-injection rerank vllm tts fastchat function-calling asr sglang

Python

212

19

10 天前

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

kvcache 大语言模型 sglang vllm inference-engine llm-framework llm-inference llm-serving Serverless ollama

Python

98

17

1 天前

sgl-project / rbg

A workload for deploying LLM inference services on Kubernetes

Kubernetes 大语言模型 sglang

Go

75

18

12 天前

Arks is a cloud-native inference framework running on Kubernetes

dynamo Kubernetes sglang vllm inference reasoning 人工智能大语言模型

Go

43

4

5 天前

modal-labs / stopwatch

A tool for benchmarking LLMs on Modal

大语言模型机器学习 sglang tensorrt-llm vllm

Python

43

4

1 个月前

blackbird-io / blackbird

A high-performance RDMA distributed storage system for fast LLM Inference and GPU Training

big-data C++distributed-cache gpu infiniband kv-cache Python rdma vllm CUDA llm-serving sglang llm-framework

C++

34

4

17 天前

sgl-project / sgl-cookbook

Make SGLang go brrr

deepseek deepseek-r1 deepseek-v3 gpt-oss llama3 llama3-1 llama4 qwen2 qwen2-5 qwen3 ome sglang Kubernetes 大语言模型

33

8

4 天前

dzhsurf / deepseek-v3-r1-deploy-and-benchmarks

DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks

deepseek-r1 deepseek-v3 sglang vllm

Python

16

3

7 个月前

sgl-project / whl

Kernel Library Wheel for SGLang

CUDA cutlass sglang

HTML

12

2

3 天前

AidanCooper / constrained-decoding

A guide to structured generation using constrained decoding

generative-model large-language-models 自然语言处理 sglang structured-generation

Jupyter Notebook

11

0

1 年前

zejia-lin / BulletServe

Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration

inference sglang 大语言模型 llm-serving

Python

11

0

11 天前

lucasavila00 / LmScript

Controllable Language Model Interactions in TypeScript

人工智能 guidance 大语言模型 TypeScript sglang

TypeScript

9

0

1 年前

didier-durand / llms-in-clouds

Experiments with LLMs in clouds (powered by SGLang)

Amazon Web Services Docker huggingface 大语言模型 qwen sglang llama mistral

Python

6

1

1 个月前