Repository navigation

model-serving

Website
Wikipedia

A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch model-serving transformer llm-serving inference llama amd CUDA tpu deepseek qwen blackwell deepseek-v3 gpt-oss kimi moe openai qwen3

Python

59432

10514

10 小时前

bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

model-serving mlops llmops generative-ai llm-inference model-inference-service inference-platform 深度学习 llm-serving 机器学习 Python multimodal ml-engineering 大语言模型 ai-inference

Python

8109

876

5 天前

kserve / kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

knative 机器学习 model-interpretability model-serving istio kubeflow 人工智能 Tensorflow PyTorch xgboost Kubernetes service-mesh kserve Hacktoberfest mlops genai llm-inference cncf

Python

4612

1266

1 天前

ahkarami / Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

深度学习深度神经网络 Python PyTorch tesnorflow Keras mxnet caffe2 production serving C++model-serving 教程 Flask REST API React Angular Tensorflow

4377

696

1 年前

FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

federated-learning 深度学习 distributed-training edge-ai 机器学习 on-device-training inference-engine mlops model-deployment model-serving ai-agent

Python

3939

763

2 个月前

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

深度学习 gpt llama 大语言模型 model-serving 自然语言处理 openai-triton

Python

3633

280

2 天前

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama 大语言模型 llm-inference llm-serving llmops lora model-serving PyTorch transformers

Python

3444

271

4 个月前

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

large-language-models ai-infra genai mlsys model-serving model-training

3312

340

2 个月前

beclab / Olares

Olares: An Open-Source Personal Cloud to Reclaim Your Data

Kubernetes 自托管 home-automation homelab edge-ai homeserver local-ai ai-agents model-serving mcp home-cloud home-server

2516

8 小时前

tensorchord / envd

🏕️ Reproducible development environment

developer-tools development-environment Docker buildkit Hacktoberfest llmops mlops model-serving

2148

161

4 天前

microsoft / aici

AICI: Prompts as (Wasm) Programs

人工智能 Rust WebAssembly wasmtime inference language-model 大语言模型 llm-framework llm-inference llm-serving llmops model-serving transformer

Rust

2049

8 个月前

mlrun / mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

mlops Python 数据科学机器学习 data-engineering experiment-tracking model-serving workflow Kubernetes

Python

1596

279

9 小时前