Repository navigation

int8

Website
Wikipedia

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models gptq int8

Python

2474

280

1 天前

intel / neural-speed

An innovative library for efficient LLM inference via low-bit quantization

cpu fp8 gpu int8 llm-inference sparsity llamacpp

C++

349

1 年前

clancylian / retinaface

Reimplement RetinaFace use C++ and TensorRT

retinaface tensorrt int8 caffe

C++

298

6 年前

Wulingtian / yolov5_tensorrt_int8_tools

tensorrt int8 量化yolov5 onnx模型

yolov5 tensorrt onnx int8

Python

185

4 年前

Wulingtian / yolov5_tensorrt_int8

TensorRT int8 量化部署 yolov5s 模型，实测3.3ms一帧！

yolov5 tensorrt int8

C++

170

4 年前

xuanandsix / Tensorrt-int8-quantization-pipline

a simple pipline of int8 quantization based on tensorrt.

int8 quantization tensorrt yolox

Python

3 年前

Wulingtian / RepVGG_TensorRT_int8

RepVGG TensorRT int8 量化，实测推理不到1ms一帧！

repvgg tensorrt int8

Python

4 年前

the0807 / YOLOv8-ONNX-TensorRT

👀 Apply YOLOv8 exported with ONNX or TensorRT(FP16, INT8) to the Real-time camera

onnx tensorrt yolov8 int8 机器视觉 object-detection

Python

1 年前

Wulingtian / nanodet_tensorrt_int8

nanodet int8 量化，实测推理2ms一帧！

nanodet tensorrt int8

C++

4 年前

ppogg / ncnn-yolov4-int8

NCNN+Int8+YOLOv4 quantitative modeling and real-time inference

ncnn yolov4 int8 real-time

C++

4 年前

Egorundel / int8_calibrator_cpp

INT8 calibrator for ONNX model with dynamic batch_size at the input and NMS module at the output. C++ Implementation.

calibration C++int8 tensorrt onnx

C++

10 个月前

aahouzi / llama2-chatbot-cpu

A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.

bfloat16 cpu int8 langchain llama2 optimization Streamlit 聊天机器人 huggingface ChatGPT llama intel meta

Python

1 年前

whitelok / tensorrt-int8-python-sample

TensorRT Int8 Python version sample. TensorRT Int8 Python 实现例子。TensorRT Int8 Pythonの例です

tensorrt 人工智能深度学习 Nvidia inference Python int8 机器学习

Python

7 年前

cbalint13 / rvv-kernels

RISCV Vector Kernel C/LLVM-IR generator

int8 Kernel LLVM 数学 RISC-V tvm vector

8 个月前

dasdristanta13 / LLM-Lora-PEFT_accumulate

LLM-Lora-PEFT_accumulate explores optimizations for Large Language Models (LLMs) using PEFT, LORA, and QLORA. Contribute experiments and implementations to enhance LLM efficiency. Join discussions and push the boundaries of LLM optimization. Let's make LLMs more efficient together!

alpaca int8 大语言模型 lora peft qlora falcon llama

Jupyter Notebook

2 年前