Repository navigation

gptq

Website
Wikipedia

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models gptq int8

Python

2503

281

5 天前

ModelCloud / GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

gptq peft quantization sglang transformers vllm

Python

811

115

1 小时前

shm007g / LLaMA-Cult-and-More

Large Language Models for All, 🦙 Cult and More, Stay in touch !

alpaca ChatGPT gpt llama ggml gpt4 gptq vicuna PyTorch Tensorflow transformers deepspeed 大语言模型

HTML

444

2 年前

bobazooba / xllm

🦖 X—LLM: Cutting Edge & Easy LLM Finetuning

alpaca cerebras ChatGPT 深度学习深度神经网络 gpt gpt-4 gptq large-language-models llama llama2 大语言模型 mistral openai vicuna Zephyr RTOS PyTorch torch

Python

406

2 年前

1b5d / llm-api

Run any Large Language Model behind a unified API

ChatGPT gptq huggingface langchain llama llamacpp 大语言模型 llm-inference 机器学习 Python

Python

170

2 年前

chenhunghan / ialacol

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

人工智能 helm Kubernetes langchain 大语言模型 Python openai cloudnative ggml gpu llamacpp CUDA gptq llm-inference llm-serving

Python

146

2 年前

abhinand5 / gptq_for_langchain

A guide about how to use GPTQ models with langchain

人工智能 gpt gptq langchain language-model 大语言模型 quantization wizardlm

Jupyter Notebook

2 年前

taishan1994 / LLM-Quantization

记录量化LLM中的总结。

gptq 大语言模型 quantization qwen3

Python

15 天前

ziwang-com / zero-lora

zero零训练llm调参

gpt gptq llama 大语言模型 lora

2 年前

hcd233 / Aris-AI-Model-Server

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

人工智能 embedding FastAPI gptq 大语言模型 MLX openai-compatible-api rag reranker sentence-transformers vllm

Python

1 个月前

tripathiarpan20 / self-improvement-4all

Private self-improvement coaching with open-source LLMs

faiss langchain Python gptq transformers

Python

2 年前

seyf1elislam / LocalLLM_OneClick_Colab

Run gguf LLM models in Latest Version TextGen-webui and koboldcpp

colab-notebook gguf gptq 大语言模型 localllama localllm Python

Jupyter Notebook

2 个月前

chinoll / chatsakura

ChatSakura：Open-source multilingual conversational model.（开源多语言对话大模型）

gradio PyTorch bloom ChatGPT instruct-gpt 大语言模型 gptq transformers

Python

3 年前

matlok-ai / bampe-weights

This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).

人工智能 blip2 foundational-models generative-ai gptq image-to-image 大语言模型 safetensors stable-diffusion tiff transformers blender blender-python 深度学习

Python

2 年前

Aqirito / A.L.I.C.E

A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system

langchain langchain-python 大语言模型 text-generation text-to-speech tts vits Anime 人工智能 Genshin Impact waifu FastAPI gptq huggingface-transformers pygmalion REST API

Python

8 个月前

bobazooba / shurale

Conversation AI model for open domain dialogs

cerebras ChatGPT 深度学习深度神经网络 gpt gpt-4 gptq large-language-models llama llama2 大语言模型 mistral 自然语言处理 openai PyTorch torch transformers vicuna

Python

2 年前

SujanNeupane42 / NEPSE-Chatbot-Using-Retrieval-augmented-generation-and-reranking

This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.

faiss Flask gptq langchain 大语言模型 Python retrieval-augmented-generation sentence-transformers vector-database

Jupyter Notebook

2 年前

upunaprosk / quantized-lm-confidence

Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

compression gptq 自然语言处理 quantization efficient-model large-language-models 大语言模型

Jupyter Notebook

9 个月前

lpalbou / model-quantizer

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

cross-platform gptq huggingface inference 大语言模型机器学习 model-compression 自然语言处理 optimization Python PyTorch quantization transformers

Python

7 个月前

amajji / LLM-Quantization-Techniques-Absmax-Zeropoint-GPTQ-GGUF

LLM quantization techniques: absmax, zero-point, GPTQ and GGUF

ggml gguf gptq llamacpp 大语言模型 quantization quantization-aware-training

Jupyter Notebook

1 年前