Repository navigation

#

llm-serving

Python
45263
1 小时前

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML
16655
6 天前
Python
13339
1 小时前

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Python
11153
3 天前
Python
7693
7 小时前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
7631
2 天前

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python
2954
11 小时前

MoBA: Mixture of Block Attention for Long-Context LLMs

Python
1744
16 天前

RayLLM - LLMs on Ray (Archived). Read README for more info.

Python
1262
1 个月前

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python
1091
38 分钟前

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++
885
5 天前

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Python
838
5 天前

A throughput-oriented high-performance serving framework for LLMs

Cuda
796
7 个月前

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++
702
3 个月前

🧬 Helix is a private GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-class testing.

Go
490
2 小时前

Community maintained hardware plugin for vLLM on Ascend

Python
485
21 小时前