Repository navigation

#

post-training-quantization

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2503
5 天前

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Python
2257
5 个月前

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Python
849
1 个月前

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Python
350
2 年前

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

Jupyter Notebook
173
3 年前

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

Python
170
1 年前

[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

Jupyter Notebook
104
6 天前

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Python
38
2 年前

Post-training static quantization using ResNet18 architecture

Jupyter Notebook
37
5 年前

Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization

Python
15
3 年前

[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation

Python
12
7 个月前

Improved the performance of 8-bit PTQ4DM expecially on FID.

Python
12
2 年前

An example to quantize MobileNetV2 trained on CIFAR-10 dataset with PyTorch FX graph mode quantization

Python
7
1 年前

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

Python
7
7 个月前