Repository navigation

#

quantization

ymcui/Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python
18804
1 年前
UFund-Me/Qbot

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ news qbot-mini: https://github.com/Charmve/iQuant

Jupyter Notebook
11116
5 个月前
bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python
6932
2 天前

Lossy PNG compressor — pngquant command based on libimagequant library

C
5362
3 个月前

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python
4817
9 天前

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Jupyter Notebook
4389
2 年前

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Python
3081
1 年前
IntelLabs/nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Python
2939
2 年前

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Python
2856
1 小时前

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Python
2663
2 年前

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Python
2642
7 个月前

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2378
2 天前

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python
2281
1 天前

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Python
2242
4 天前

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

2061
2 个月前