Repository navigation

#

llm-inference

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++
73139
1 个月前

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML
16651
6 天前

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python
11987
2 天前

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

Python
11154
3 天前

Official inference library for Mistral models

Jupyter Notebook
10179
1 个月前

High-speed Large Language Model Serving for Local Deployment

C++
8174
2 个月前
bentoml/BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python
7631
2 天前
Python
6122
2 天前
xlite-dev/Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Python
3852
1 天前

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

Python
2999
2 天前

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python
2952
11 小时前

FlashInfer: Kernel Library for LLM Serving

Cuda
2679
1 天前

Code examples and resources for DBRX, a large language model developed by Databricks

Python
2551
1 年前