Repository navigation

quantization

Website
Wikipedia

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

fine-tuning llama 大语言模型 peft transformers rlhf qlora quantization qwen instruction-tuning gpt lora large-language-models agent 人工智能 moe llama3 deepseek gemma 自然语言处理

Python

59688

7321

6 小时前

ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

大语言模型 plm pre-trained-language-models alpaca llama 自然语言处理 quantization large-language-models lora alpaca-2 llama-2

Python

18925

1877

3 个月前

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

深度学习 inference quantization speech-recognition speech-to-text transformer Whisper openai

Python

18394

1519

2 个月前

UFund-Me / Qbot

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ news qbot-mini: https://github.com/Charmve/iQuant

funds 机器学习 pytrade quantitative-finance quantitative-trading quantization strategies trademarks quant-trader 比特币区块链深度学习 fintech backtest

Jupyter Notebook

14263

2020

3 个月前

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

大语言模型机器学习 PyTorch qlora quantization

Python

7626

792

2 天前

kornelski / pngquant

Lossy PNG compressor — pngquant command based on libimagequant library

pngquant Code quality png png-compression quantization stdin palette conversion image-optimization C

5445

495

3 个月前

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

transformers 深度学习 inference large-language-models 大语言模型自然语言处理 PyTorch quantization transformer

Python

4948

525

6 个月前

IntelLabs / distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

PyTorch pruning quantization Jupyter Notebook 深度神经网络 regularization distillation onnx

Jupyter Notebook

4399

805

2 年前

OpenNMT / CTranslate2

Fast inference engine for Transformer models

neural-machine-translation C++mkl quantization CUDA thrust opennmt 深度神经网络 openmp onednn intrinsics avx2 avx parallel-computing gemm neon transformer-models machine-translation 深度学习 inference

C++

4046

402

6 个月前

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

机器学习 onnx inference 机器视觉 object-detection pruning quantization pretrained-models 自然语言处理 cpus sparsification llm-inference performance

Python

3154

192

4 个月前

nunchaku-tech / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

diffusion-models flux genai lora mlsys quantization iclr iclr2025 comfyui

Python

3141

175

4 天前

huawei-noah / Pretrained-Language-Model

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

knowledge-distillation model-compression quantization pretrained-models

Python

3139

643

2 年前

huggingface / optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

onnx PyTorch inference training intel graphcore onnxruntime transformers quantization habana optimization tflite

Python

3109

596

12 小时前

IntelLabs / nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

深度学习自然语言处理 nlu Tensorflow dynet PyTorch bert transformers quantization

Python

2939

447

3 年前

aaron-xichen / pytorch-playground

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

pytorch-tutorial pytorch-tutorials PyTorch quantization

Python

2691

622

3 年前

stochasticai / xTuring

Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

深度学习 fine-tuning gpt-2 gpt-j llama 大语言模型 lora language-model finetuning adapter gen-ai generative-ai mistral peft quantization

Python

2657

204

4 天前

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models gptq int8

Python

2503

281

5 天前

thu-ml / SageAttention

Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.

attention 大语言模型 quantization CUDA triton video-generation mlsys vit

Cuda

2470

233

7 天前