Repository navigation

#

llm-inference

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++
76485
3 个月前

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML
20209
16 天前

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python
12637
5 天前

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Python
11699
1 天前

Official inference library for Mistral models

Jupyter Notebook
10422
5 个月前

High-speed Large Language Model Serving for Local Deployment

C++
8306
18 天前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
7995
1 天前

🚀 全网效果最好的移动端【实时对话数字人】。 支持本地部署、多模态交互(语音、文本、表情),响应速度低于 1.5 秒,适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用,开发者友好。

C++
7373
17 小时前
Python
6879
14 小时前

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Python
4453
1 天前
xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python
4403
12 小时前

The smart edge and AI gateway for agents. Arch is a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc. Natively designed to process prompts, it's framework-agnostic and helps you build agents faster.

Rust
3525
17 小时前

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python
3384
3 个月前