Repository navigation
#
trainium
- Website
- Wikipedia
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
55736
17 分钟前
Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.
Jupyter Notebook
249
4 个月前
A production-ready inference server supporting any AI model on all major hardware platforms (CPU, GPU, TPU, Apple Silicon). Inferno seamlessly deploys and serves language models from Hugging Face, local files, or GGUF format with automatic memory management and hardware optimization. Developed by HelpingAI.
Python
3
4 个月前