Repository navigation
#
trainium
- Website
- Wikipedia
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
45285
3 小时前
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
Jupyter Notebook
238
8 天前
A production-ready inference server supporting any AI model on all major hardware platforms (CPU, GPU, TPU, Apple Silicon). Inferno seamlessly deploys and serves language models from Hugging Face, local files, or GGUF format with automatic memory management and hardware optimization. Developed by HelpingAI.
Python
3
4 天前