Repository navigation

#

llama-cpp

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!

TypeScript
11000
1 年前

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

C#
3323
4 天前
Mobile-Artificial-Intelligence/maid

Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.

Dart
2142
23 天前
withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level

TypeScript
1629
8 天前

prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters

C++
997
1 个月前
C
627
16 小时前

Build and run AI agents using Docker Compose. A collection of ready-to-use examples for orchestrating open-source LLMs, tools, and agent runtimes.

TypeScript
431
14 天前

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

Python
356
3 个月前

This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.

Python
288
1 年前

Review/Check GGUF files and estimate the memory usage and maximum tokens per second.

Go
198
2 天前

Run LLMs locally. A clojure wrapper for llama.cpp.

Clojure
166
5 个月前

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

C++
160
1 年前

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

Rust
153
25 天前