Repository navigation
llama-cpp
- Website
- Wikipedia
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
Build and run AI agents using Docker Compose. A collection of ready-to-use examples for orchestrating open-source LLMs, tools, and agent runtimes.
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
Local ML voice chat using high-end models.
Making offline AI models accessible to all types of edge devices.