Repository navigation
gptq
- Website
- Wikipedia
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Large Language Models for All, 🦙 Cult and More, Stay in touch !
Run any Large Language Model behind a unified API
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
A guide about how to use GPTQ models with langchain
An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API
Private self-improvement coaching with open-source LLMs
Run gguf LLM models in Latest Version TextGen-webui and koboldcpp
ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)
This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).
A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system
This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.
Code for NAACL paper When Quantization Affects Confidence of Large Language Models?
Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.
LLM quantization techniques: absmax, zero-point, GPTQ and GGUF