Repository navigation

#

gptq

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2473
20 小时前

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Python
737
1 小时前

Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Transformers, and vLLM. Export your models effortlessly to autogptq, autoawq, gguf and autoround formats with high accuracy even at extremely low bit precision.

Python
591
36 分钟前

Large Language Models for All, 🦙 Cult and More, Stay in touch !

HTML
444
2 年前
Python
171
2 年前
Jupyter Notebook
40
2 年前

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

Python
16
1 个月前

Private self-improvement coaching with open-source LLMs

Python
15
1 年前

记录量化LLM中的总结。

Python
14
7 天前

Run gguf LLM models in Latest Version TextGen-webui and koboldcpp

Jupyter Notebook
14
13 天前

ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)

Python
13
2 年前

This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).

Python
9
2 年前

A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system

Python
9
6 个月前

This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.

Jupyter Notebook
3
2 年前

Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

Jupyter Notebook
3
8 个月前

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

Python
2
5 个月前