Repository navigation

#

fastertransformer

Python
6137
2 天前

Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes

Python
20
2 年前

tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server

Python
5
2 年前

This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.

Python
0
2 年前