Repository navigation

inference

Website
Wikipedia

A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python

45263

6937

1 小时前

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

深度学习 hpc large-scale data-parallelism pipeline-parallelism model-parallelism 人工智能 big-model distributed-computing inference heterogeneous-training foundation-models

Python

40790

4495

1 天前

ggml-org / whisper.cpp

Port of OpenAI's Whisper model in C/C++

openai speech-to-text transformer Whisper inference speech-recognition

C++

39337

4125

2 天前

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

深度学习 PyTorch gpu 机器学习 billion-parameters data-parallelism model-parallelism inference pipeline-parallelism compression mixture-of-experts trillion-parameters zero

Python

37990

4339

9 小时前

google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

mediapipe C++机器视觉深度学习 Android video-processing audio-processing mobile-development 机器学习 inference graph-framework graph-based calculator 框架 pipeline-framework stream-processing perception

C++

29424

5319

1 天前

Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

inference high-preformance simd arm-neon 深度学习人工智能 Android iOS ncnn vulkan 神经网络 caffe mxnet PyTorch onnx darknet Tensorflow mlir Keras RISC-V

C++

21330

4236

21 小时前

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

深度学习 inference quantization speech-recognition speech-to-text transformer Whisper openai

Python

15496

1302

1 个月前

stas00 / ml-engineering

Machine Learning Engineering Open Book

PyTorch slurm large-language-models 大语言模型机器学习 scalability transformers machine-learning-engineering mlops 人工智能 inference training

Python

13444

816

11 天前

gvergnaud / ts-pattern

🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

pattern-matching TypeScript ts pattern matching inference type-inference exhaustive conditions branching JavaScript

TypeScript

13408

148

21 天前

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

CUDA inference llama llava 大语言模型 llm-serving moe PyTorch transformer vlm llama3 llama3-1 deepseek deepseek-llm deepseek-v3 deepseek-r1 deepseek-r1-zero

Python

13338

1545

1 小时前

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

tensorrt Nvidia 深度学习 inference gpu-acceleration

C++

11477

2182

1 个月前

aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

sagemaker Amazon Web Services reinforcement-learning 机器学习深度学习 Example Jupyter Notebook mlops 数据科学 training inference

Jupyter Notebook

10446

6867

1 个月前

huggingface / text-generation-inference

Large Language Model Text Generation Inference

bloom 自然语言处理 PyTorch inference gpt 深度学习 transformer falcon starcoder

Python

10028

1184

16 小时前

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

inference gpu 机器学习深度学习 cloud datacenter Edge

Python

9087

1558

8 小时前

dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

深度学习 inference 机器视觉 embedded image-recognition object-detection segmentation jetson jetson-tx1 jetson-tx2 jetson-xavier Nvidia tensorrt caffe video-analytics Robotics 机器学习 jetson-nano

C++

8235

3042

6 个月前

openvinotoolkit / openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

inference 深度学习 openvino 人工智能机器视觉 diffusion-models generative-ai llm-inference 自然语言处理 performance-boost speech-recognition stable-diffusion deploy-ai optimize-ai transformers yolo recommendation-system good-first-issue

C++

8137

2573

3 小时前

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.