Repository navigation

inference-server

Website
Wikipedia

Turn any computer or edge device into a command center for your computer vision projects.

机器视觉 inference-api inference-server vit yolov5 yolov8 jetson tensorrt classification instance-segmentation object-detection onnx 部署 Docker inference 机器学习 Python yolo11 agents

Python

1635

171

14 小时前

containers / ramalama

The goal of RamaLama is to make working with AI boring.

人工智能 containers inference-server llamacpp podman vllm 大语言模型

Python

1538

162

3 小时前

basetenlabs / truss

The simplest way to serve AI/ML models in production

机器学习人工智能 easy-to-use inference-api inference-server model-serving Open Source packaging falcon stable-diffusion Whisper wizardlm

Python

976

3 小时前

pipeless-ai / pipeless

An open-source computer vision framework to build and deploy apps in minutes

Rust

749

1 年前

underneathall / pinferencia

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

人工智能 inference-server predict inference 深度学习机器学习 Python serving model-deployment huggingface PyTorch Tensorflow transformers 数据科学 model-serving 机器视觉自然语言处理 paddlepaddle

Python

557

2 年前

NVIDIA / gpu-rest-engine

A REST API for Caffe using Docker and Go

caffe gpu inference inference-server Docker 深度学习

C++

419

7 年前

BMW-InnovationLab / BMW-YOLOv4-Inference-API-GPU

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

yolov3 inference gpu API 深度学习机器视觉 bounding-boxes inference-server Docker REST API yolo 神经网络 Dockerfile yolov4 无代码

Python

280

3 年前

BMW-InnovationLab / BMW-YOLOv4-Inference-API-CPU

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

yolov3 inference API cpu 深度学习机器视觉 OpenCV object-detection Docker 深度神经网络神经网络 REST API inference-server bounding-boxes yolov4 无代码

Python

220

3 年前

containers / podman-desktop-extension-ai-lab

Work with LLMs on a local environment using containers

人工智能 containers inference-server llms local podman

TypeScript

215

2 天前

BMW-InnovationLab / BMW-TensorFlow-Inference-API-CPU

This is a repository for an object detection inference API using the Tensorflow framework.

Tensorflow inference API cpu 深度学习 object-detection 机器视觉 Docker bounding-boxes Docker Image docker-ce inference-engine inference-server REST API

Python

183

3 年前

autodeployai / ai-serving

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

onnx inference-server onnx-models inference

Scala

157

6 个月前

kibae / onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

人工智能机器学习 onnx onnxruntime 深度学习 inference-server nueral-networks CUDA contributions-welcome

C++

155

1 个月前

vertexclique / orkhon

Orkhon: ML Inference Framework and Server Runtime

inference-server 机器学习 Python Tensorflow async multiprocessing data-parallelism

Rust

149

4 年前

kf5i / k3ai

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

kubeflow-pipelines Kubernetes k3s 机器学习 datascience 人工智能 Edge kubeflow inference-server

PowerShell

101

3 年前

notAI-tech / fastDeploy

Deploy DL/ ML inference pipelines with minimal extra code.

深度学习 PyTorch serving falcon gevent Docker model-deployment model-serving http-server gunicorn triton-inference-server Python triton inference-server streaming-audio WebSocket

Python

5 个月前

RubixML / Server

A standalone inference server for trained Rubix ML estimators.

机器学习 http-server infrastructure API model-deployment 微服务 JSON:API PHP REST API inference inference-engine ml-infrastructure inference-server

PHP

23 天前

friendliai / friendli-client

Friendli: the fastest serving engine for generative AI

generative-ai 大语言模型 llm-inference llmops serving gpt gpt3 inference llama2 llm-serving llms inference-engine inference-server 人工智能 llm-ops mistral 机器学习 mlops stable-diffusion

Python

3 个月前

curtisgray / wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

人工智能聊天机器人 ChatGPT Linux llama llamacpp 大语言模型 local macOS Windows download downloader openai gpu gpu-acceleration gpu-monitoring inference inference-engine inference-server

TypeScript

1 年前

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

triton-inference-server tensorrt onnx PyTorch nvidia-docker inference-engine inference-server inference text-detection

Python

4 年前

haicheviet / fullstack-machine-learning-inference

Fullstack machine learning inference template

Amazon Web Services cloudformation FastAPI full-stack inference-server Infrastructure as code 机器学习

Jupyter Notebook

1 年前