Repository navigation
gptq
- Website
- Wikipedia
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Large Language Models for All, 🦙 Cult and More, Stay in touch !
Advanced Quantization Algorithm for LLMs/VLMs.
Run any Large Language Model behind a unified API
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
A guide about how to use GPTQ models with langchain
ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)
Private self-improvement coaching with open-source LLMs
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
Run gguf LLM models in Latest Version TextGen-webui
A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system
Code for NAACL paper When Quantization Affects Confidence of Large Language Models?
This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.
LLM quantization techniques: absmax, zero-point, GPTQ and GGUF
Quantizing LLMs using GPTQ