Repository navigation

#

quantization-aware-training

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2474
1 天前

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Python
2257
3 个月前

TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.

Python
842
3 个月前

0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

Python
305
1 年前

针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库

Python
251
2 年前

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

Jupyter Notebook
172
3 年前
Python
44
4 年前

Train neural networks with joint quantization and pruning on both weights and activations using any pytorch modules

Python
42
3 年前

FakeQuantize with Learned Step Size(LSQ+) as Observer in PyTorch

C++
34
4 年前

Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'

Jupyter Notebook
24
3 年前

Offical implementation of "Quantized Spike-driven Transformer" (ICLR2025)

Python
23
4 个月前