Repository navigation

#

llm-serving

Python
59425
7 小时前

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML
21137
2 个月前
Python
18602
7 分钟前

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Python
11818
5 天前

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

C++
11772
19 小时前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
8109
5 天前

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

Python
3522
5 天前

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python
3444
4 个月前

MoBA: Mixture of Block Attention for Long-Context LLMs

Python
1911
6 个月前

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python
1292
11 小时前

RayLLM - LLMs on Ray (Archived). Read README for more info.

Python
1262
7 个月前

Community maintained hardware plugin for vLLM on Ascend

Python
1178
4 天前

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++
900
3 个月前

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook
894
18 天前

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++
874
10 小时前