Repository navigation
#
fastertransformer
- Website
- Wikipedia
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Python
6137
2 天前
Serving Example of CodeGen-350M-Mono-GPTJ on Triton Inference Server with Docker and Kubernetes
Python
20
2 年前
Deploy KoGPT with Triton Inference Server
Shell
14
2 年前
tutorial on how to deploy a scalable autoregressive causal language model transformer using nvidia triton server
Python
5
2 年前
This repository is a code sample to serve Large Language Models (LLM) on a Google Kubernetes Engine (GKE) cluster with GPUs running NVIDIA Triton Inference Server with FasterTransformer backend.
Python
0
2 年前