Repository navigation

#

post-training-quantization

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2375
19 小时前

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Python
2242
3 天前

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Python
823
1 个月前

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python
457
1 天前

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Python
333
2 年前

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

Jupyter Notebook
172
2 年前

[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.

Python
156
7 个月前

[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

Jupyter Notebook
62
9 个月前

Post-training static quantization using ResNet18 architecture

Jupyter Notebook
37
5 年前

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Python
35
1 年前

Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization

Python
15
3 年前

[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation

Python
10
1 个月前

Improved the performance of 8-bit PTQ4DM expecially on FID.

Python
9
2 年前

[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation

Python
7
1 个月前