Repository navigation

#

sglang

kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++
3786
4 分钟前

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python
737
32 分钟前

基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。

Python
511
3 个月前

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python
317
2 天前

☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!

Go
239
2 天前

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go
226
2 天前

gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。

Python
205
2 天前

kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.

Python
58
2 天前

Arks is a cloud-native inference framework running on Kubernetes

Go
43
9 天前

A tool for benchmarking LLMs on Modal

Python
42
10 小时前

DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks

Python
12
5 个月前

A guide to structured generation using constrained decoding

Jupyter Notebook
11
1 年前

Kernel Library Wheel for SGLang

HTML
11
2 天前

Controllable Language Model Interactions in TypeScript

TypeScript
9
1 年前

Experiments with LLMs in clouds (powered by SGLang)

Python
7
5 天前

The Private AI Setup Dream Guide for Demos automates the installation of the software needed for a local private AI setup, utilizing AI models (LLMs and diffusion models) for use cases such as general assistance, business ideas, coding, image generation, systems administration, marketing, planning, and more.

Shell
3
1 个月前

llmd is a LLMs daemonset, it provide model manager and get up and running large language models, it can use llama.cpp or vllm or sglang to running large language models.

Makefile
3
6 个月前

Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers & practitioners.

Python
3
17 天前

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

3
1 个月前