Repository navigation

#

on-device-llms

prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters

C++
997
1 个月前

[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

Python
29
21 天前