Repository navigation

#

int8

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2378
2 天前

An innovative library for efficient LLM inference via low-bit quantization

C++
350
8 个月前

Reimplement RetinaFace use C++ and TensorRT

C++
297
5 年前

tensorrt int8 量化yolov5 onnx模型

Python
182
4 年前

TensorRT int8 量化部署 yolov5s 模型,实测3.3ms一帧!

C++
168
4 年前

RepVGG TensorRT int8 量化,实测推理不到1ms一帧!

Python
63
4 年前

a simple pipline of int8 quantization based on tensorrt.

Python
62
3 年前

👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera

Python
49
1 年前

nanodet int8 量化,实测推理2ms一帧!

C++
37
4 年前

NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

C++
24
4 年前

TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です

Python
14
6 年前

INT8 calibrator for ONNX model with dynamic batch_size at the input and NMS module at the output. C++ Implementation.

C++
13
6 个月前

A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.

Python
13
1 年前

RISCV Vector Kernel C/LLVM-IR generator

C
7
4 个月前

LLM-Lora-PEFT_accumulate explores optimizations for Large Language Models (LLMs) using PEFT, LORA, and QLORA. Contribute experiments and implementations to enhance LLM efficiency. Join discussions and push the boundaries of LLM optimization. Let's make LLMs more efficient together!

Jupyter Notebook
6
2 年前

MT-Yolov6 TensorRT Inference with Python.

Python
6
3 年前
Visual Basic .NET
4
5 个月前