Repository navigation

#

model-serving

Python
55740
1 小时前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
7996
2 天前

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Python
4456
1 天前

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

4368
9 个月前

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

Python
3917
9 天前

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python
3522
14 小时前

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python
3384
3 个月前

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

3200
1 个月前

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

Python
1572
15 小时前

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python
1239
1 天前

Community maintained hardware plugin for vLLM on Ascend

Python
1016
19 小时前

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++
900
1 个月前

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook
872
8 天前

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

Python
852
19 天前