Repository navigation

llm-inference

Website
Wikipedia

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

llm-inference ai-chat

C++

73139

7970

1 个月前

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python

36624

6228

5 小时前

gitleaks / gitleaks

Find secrets with Gitleaks 🔑

安全 Git Go secret gitleaks devsecops Hacktoberfest CI/CD 命令行界面 data-loss-prevention dlp Open Source ai-powered 大语言模型 llm-inference llm-training

19556

1589

2 天前

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

大语言模型 llm-inference llm-serving llm-training llmops

HTML

16651

1944

6 天前

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

人工智能深度学习 large-language-models 大语言模型 llm-inference llms

Python

11987

1212

2 天前

bentoml / OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

大语言模型 llmops model-inference fine-tuning llm-serving llama vicuna bentoml llama2 llm-inference llm-ops mistral mlops llama3-1

Python

11154

711

3 天前

mistralai / mistral-inference

Official inference library for Mistral models

大语言模型 llm-inference mistralai

Jupyter Notebook

10179

907

1 个月前

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

large-language-models llama 大语言模型 llm-inference local-inference

C++

8174

428

2 个月前

openvinotoolkit / openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

inference 深度学习 openvino 人工智能机器视觉 diffusion-models generative-ai llm-inference 自然语言处理 performance-boost speech-recognition stable-diffusion deploy-ai optimize-ai transformers yolo recommendation-system good-first-issue

C++

8137

2573

3 小时前

bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

model-serving mlops llmops generative-ai llm-inference 深度学习 llm-serving 机器学习 Python multimodal ml-engineering 大语言模型

Python

7631

833

2 天前

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

cuda-kernels deepspeed fastertransformer llm-inference turbomind internlm llama 大语言模型 codellama llama2 llama3

Python

6122

530

2 天前

superduper-io / superduper

Superduper: End-to-end framework for building custom AI applications and agents.

人工智能 mlops torch transformers MongoDB Python PyTorch 机器学习数据库 data inference llm-inference pretrained-models 聊天机器人 semantic-search llm-serving llmops vector-search rag

Python

5033

492

15 小时前

kserve / kserve

Standardized Serverless ML Inference Platform on Kubernetes

knative 机器学习 model-interpretability model-serving istio kubeflow 人工智能 Tensorflow PyTorch scikit-learn xgboost Kubernetes service-mesh Hacktoberfest mlops genai llm-inference

Python

4079

1160

10 小时前

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

flash-attention tensorrt-llm vllm llm-inference deepseek deepseek-v3 deepseek-r1

Python

3852

275

1 天前

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

机器学习 onnx inference 机器视觉 object-detection pruning quantization pretrained-models 自然语言处理 cpus sparsification llm-inference performance

Python

3134

183

9 个月前

NVIDIA / GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models 大语言模型 llm-inference 微服务 nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Python

2999

715

2 天前

FellouAI / eko

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

agent agentic-ai agentic-framework agentic-workflow computeruse natural-language-inference workflow rag agents chain-of-thought genai llm-inference llmapi prompt-engineering llm-agents ai-agents browser-automation computer-automation

TypeScript

2955

203

2 天前

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama 大语言模型 llm-inference llm-serving llmops lora model-serving PyTorch transformers

Python

2952

212

11 小时前

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

flash-attention gpu CUDA PyTorch llm-inference jit

Cuda

2679

282

1 天前

databricks / dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

databricks gen-ai generative-ai 大语言模型 llm-inference llm-training mosaic-ai

Python

2551

241

1 年前