Repository navigation

inferentia

Website
Wikipedia

A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python

59427

10513

7 小时前

aphrodite-engine / aphrodite-engine

Large-scale LLM inference engine

API inference-engine 机器学习 CUDA inferentia rocm intel lora speculative-decoding tpu

C++

1558

168

2 天前

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking foundation-models inferentia llama2 sagemaker generative-ai benchmark bedrock llama3 trainium evaluation-metrics deepseek deepseek-r1

Jupyter Notebook

254

6 个月前

aws-solutions-library-samples / guidance-for-machine-learning-inference-on-aws

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways you can pack thousands of unique PyTorch deep learning (DL) models into a scalable architecture and evaluate performance

inferentia 机器学习

Shell

4 个月前

aws-samples / aws-inferentia-huggingface-workshop

CMP314 Optimizing NLP models with Amazon EC2 Inf1 instances in Amazon Sagemaker

自然语言处理 sagemaker inferentia

Jupyter Notebook

2 年前

aws-samples / awsome-fmops

Collection of bet practices, reference architectures, examples, and utilities for foundation model development and deployment on AWS.

eks generative-ai gpu inferentia kserve Kubernetes llm-inference Terraform llm-training PyTorch

HCL

2 个月前

daekeun-ml / aws-inferentia

This repository provides an easy hands-on way to get started with AWS Inferentia. A demonstration of this hands-on can be seen in the AWS Innovate 2023 - AIML Edition session.

inferentia

Jupyter Notebook

3 年前