Repository navigation

inference-optimization

Website
Wikipedia

High-efficiency floating-point neural network inference operators for mobile, server, and Web

neural-networks inference inference-optimization simd cpu multithreading matrix-multiplication convolutional-neural-networks convolutional-neural-network 神经网络 mobile-inference

2002

411

14 小时前

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

编译器深度学习机器学习 PyTorch Tensorflow inference-optimization mlir 神经网络

C++

861

164

4 个月前

jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning

深度学习深度神经网络 inference-optimization

C++

706

2 年前

mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

inference-optimization cnn parallelism acceleration

C++

197

3 年前

imedslab / pytorch_bn_fusion

Batch normalization fusion for PyTorch

PyTorch inference-optimization 深度学习深度神经网络

Python

197

5 年前

ZFTurbo / Keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time

Keras inference-optimization

Python

157

5 年前

Rapternmn / PyTorch-Onnx-Tensorrt

A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3

tensorrt onnxruntime onnx PyTorch yolov3 inference-optimization darknet

Python

5 年前

BaiTheBest / SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

pruning inference-optimization large-language-models model-compression

Python

23 天前

keli-wen / AGI-Study

The blog, read report and code example for AGI/LLM related knowledge.

code-examples Demo inference-optimization 大语言模型

Python

3 个月前

vbdi / divprune

[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

inference-optimization 大语言模型 multimodal-large-language-models pruning vision-language-model llava multi-modality

Python

18 天前

lmaxwell / Armednn

cross-platform modular neural network inference library, small and efficient

inference-engine 神经网络 lstm inference-optimization

C++

2 年前

ksm26 / Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

batch-processing inference-optimization machine-learning-operations model-serving text-generation

Jupyter Notebook

1 年前

Harly-1506 / Faster-Inference-yolov8

Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢

object-detection openvino segmentation yolov8 图像处理 inference-optimization numpy-arrays OpenCV torch ultralytics

Python

4 个月前

grazder / template.cpp

A template for getting started writing code using GGML

C++ggml 深度学习 inference-optimization

C++

1 年前

ccs96307 / fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

acceleration inference-optimization large-language-models speculative-decoding

Python

1 个月前

amazon-science / llm-rank-pruning

LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

graph-theory inference-optimization large-language-models 大语言模型 llms pruning

Python

5 个月前