Repository navigation

#

tensorrt-llm

xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python
4579
2 个月前

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Jupyter Notebook
470
1 年前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

334
2 个月前

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Python
313
10 天前

OpenAI compatible API for TensorRT LLM triton backend

Rust
214
1 年前

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

C++
167
5 个月前

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Python
155
5 个月前

This repository is an AI Bootcamp material that consist of a workflow for LLM

Jupyter Notebook
93
2 个月前

TensorRT-LLM server with Structured Outputs (JSON) built with Rust

Rust
59
5 个月前

A tool for benchmarking LLMs on Modal

Python
43
1 个月前

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

Python
42
4 个月前

Getting started with TensorRT-LLM using BLOOM as a case study

Jupyter Notebook
23
2 年前

AI Infra LLM infer/ tensorrt-llm/ vllm

Python
21
10 个月前

大模型推理框架加速,让 LLM 飞起来

Python
20
1 年前

LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.

Python
11
3 个月前

MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum possible loss for the player, making it a popular choice for developing AI opponents in various game scenarios.

5
1 小时前