Repository navigation

#

triton-inference-server

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Python
3004
3 天前
Python
1508
4 个月前

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

C++
283
3 年前

OpenAI compatible API for TensorRT LLM triton backend

Rust
205
9 个月前

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

C++
157
1 个月前

NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

C++
112
2 个月前

Анализ трафика на круговом движении с использованием компьютерного зрения

Python
82
1 个月前
Jupyter Notebook
51
1 年前

Compare multiple optimization methods on triton to imporve model service performance

Jupyter Notebook
50
1 年前

Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.

Python
48
2 年前

Tiny configuration for Triton Inference Server

Python
45
3 个月前

Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.

Python
43
2 年前

Provides an ensemble model to deploy a YoloV8 ONNX model to Triton

Python
35
2 年前

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

Python
33
4 年前
Python
31
2 个月前