Repository navigation

blackwell

Website
Wikipedia

A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch model-serving transformer llm-serving inference llama amd CUDA tpu deepseek qwen blackwell deepseek-v3 gpt-oss kimi moe openai qwen3

Python

59432

10514

10 小时前

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

CUDA inference llama llava 大语言模型 llm-serving moe PyTorch transformer vlm llama3 deepseek deepseek-v3 deepseek-r1 qwen3 blackwell openai kimi gpt-oss

Python

18602

3070

3 小时前

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

blackwell CUDA moe PyTorch llm-serving

C++

11772

1779

1 天前