Repository navigation

trainium

Website
Wikipedia

A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python

55736

9502

17 分钟前

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking foundation-models inferentia llama2 sagemaker generative-ai benchmark bedrock llama3 trainium evaluation-metrics deepseek deepseek-r1

Jupyter Notebook

249

4 个月前

HelpingAI / inferno

A production-ready inference server supporting any AI model on all major hardware platforms (CPU, GPU, TPU, Apple Silicon). Inferno seamlessly deploys and serves language models from Hugging Face, local files, or GGUF format with automatic memory management and hardware optimization. Developed by HelpingAI.

llm-serving model-serving PyTorch trainium

Python

4 个月前