Repository navigation

#

quantization

ymcui/Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python
18925
3 个月前
UFund-Me/Qbot

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ news qbot-mini: https://github.com/Charmve/iQuant

Jupyter Notebook
14263
3 个月前
bitsandbytes-foundation/bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python
7626
2 天前

Lossy PNG compressor — pngquant command based on libimagequant library

C
5445
3 个月前

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python
4948
6 个月前

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Jupyter Notebook
4399
2 年前

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python
3141
4 天前

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Python
3139
2 年前

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Python
3109
12 小时前
IntelLabs/nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Python
2939
3 年前

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Python
2691
3 年前

Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Python
2657
4 天前

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2503
5 天前

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

Cuda
2470
7 天前

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python
2466
17 小时前
Python
2383
1 天前