Repository navigation

#

gptq

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Python
2503
5 天前

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Python
811
1 小时前

Large Language Models for All, 🦙 Cult and More, Stay in touch !

HTML
444
2 年前
Python
170
2 年前
Jupyter Notebook
39
2 年前

记录量化LLM中的总结。

Python
33
15 天前

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API

Python
17
1 个月前

Private self-improvement coaching with open-source LLMs

Python
15
2 年前

Run gguf LLM models in Latest Version TextGen-webui and koboldcpp

Jupyter Notebook
15
2 个月前

ChatSakura:Open-source multilingual conversational model.(开源多语言对话大模型)

Python
13
3 年前

This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for risk(s).

Python
9
2 年前

A.L.I.C.E (Artificial Labile Intelligence Cybernated Existence). A REST API of A.I companion for creating more complex system

Python
9
8 个月前

This project will develop a NEPSE chatbot using an open-source LLM, incorporating sentence transformers, vector database and reranking.

Jupyter Notebook
3
2 年前

Code for NAACL paper When Quantization Affects Confidence of Large Language Models?

Jupyter Notebook
3
9 个月前

Effortlessly quantize, benchmark, and publish Hugging Face models with cross-platform support for CPU/GPU. Reduce model size by 75% while maintaining performance.

Python
2
7 个月前