Repository navigation
inference
- Website
- Wikipedia
A high-throughput and memory-efficient inference and serving engine for LLMs
Making large AI models cheaper, faster and more accessible
Port of OpenAI's Whisper model in C/C++
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Cross-platform, customizable ML solutions for live and streaming media.
Faster Whisper transcription with CTranslate2
Machine Learning Engineering Open Book
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
SGLang is a fast serving framework for large language models and vision language models.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
Large Language Model Text Generation Inference
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
Runtime type system for IO decoding/encoding
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams