Frameworks, tools, and resources for Large Language Models (LLMs), including training, inference (vLLM, llama.cpp), and RAG.
Large Language Models (LLMs)
Repositories
Ollama is a lightweight framework for running and managing open-source large language models locally. It provides a simple CLI and REST API for building AI applications, supporting models like Llama, Gemma, and Mistral with easy integration into various tools and platforms.
LangChain is a framework for building agents and LLM-powered applications. It helps chain together interoperable components and third-party integrations to simplify AI application development while future-proofing decisions as technology evolves.
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.
DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model with 671B total parameters and 37B activated per token. It features Multi-head Latent Attention, FP8 training, and multi-token prediction, achieving performance comparable to leading closed-source models while maintaining training efficiency and stability.
DeepSeek-R1 is a first-generation reasoning model achieving performance comparable to OpenAI-o1 in math, code, and reasoning tasks. It features a 671B parameter MoE architecture and is open-sourced under MIT license, including distilled smaller models.
GPT4All is an open-source ecosystem that enables you to run powerful large language models (LLMs) privately on everyday desktops and laptops. No API calls or GPUs required—just download the app and start chatting with local AI models.
vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed at UC Berkeley, it features state-of-the-art throughput, efficient memory management with PagedAttention, continuous batching, and seamless integration with Hugging Face models.
A comprehensive academic AI assistant supporting multiple LLMs (GPT/GLM/Qwen/DeepSeek). Specialized in paper translation, polishing, code analysis, and academic writing with modular plugin system and customizable shortcuts.
Official Meta Llama 2 inference code repository. Provides minimal implementation to load and run Llama models (7B-70B parameters) for text completion and chat applications. Includes model weights, tokenizer, and example scripts for local deployment.
xAI's Grok-1: A 314B parameter Mixture-of-Experts model with JAX implementation. Open-source weights and architecture for advanced AI research and deployment.
LlamaIndex is an open-source data framework for building LLM applications with retrieval-augmented generation (RAG). It provides data connectors, indexing tools, and query interfaces to enhance LLMs with private data.
A comprehensive Gradio web UI for running Large Language Models locally with 100% privacy. Supports text generation, vision models, tool-calling, training, image generation, and OpenAI-compatible API.
Microsoft's official inference framework for 1-bit LLMs, providing fast and lossless inference on CPU and GPU with optimized kernels for efficient edge deployment.
LightRAG is a lightweight and efficient Retrieval-Augmented Generation framework that integrates knowledge graphs with vector retrieval for enhanced document understanding. It supports multiple storage backends, multimodal processing, and provides both API and Web UI interfaces.
Qwen3 is an advanced open-source LLM series by Alibaba Cloud, featuring dual thinking/non-thinking modes, 1M token context, multilingual support, and state-of-the-art reasoning capabilities for complex problem-solving.
Open R1 is a community-driven project to fully reproduce DeepSeek-R1's reasoning capabilities. It provides training pipelines, evaluation scripts, and datasets for SFT, GRPO, and data generation, enabling transparent AI reasoning model development.
A powerful local LLM frontend for power users, supporting multiple AI APIs, image generation, TTS, and extensive customization options for immersive roleplaying experiences.