Repository navigation

#

fp8

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

Python
2660
3 小时前

Microsoft Automatic Mixed Precision Library

Python
616
1 年前

An innovative library for efficient LLM inference via low-bit quantization

C++
349
1 年前

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Python
274
10 个月前

JAX Scalify: end-to-end scaled arithmetics

Python
16
10 个月前

A modular, accelerator-ready machine learning framework built in Go that speaks float8/16/32/64. Designed with clean architecture, strong typing, and native concurrency for scalable, production-ready AI systems. Ideal for engineers who value simplicity, speed, and maintainability.

Go
3
16 天前

Cog Single GPU Quantized Implementation of Step-Video-T2V

Python
1
6 个月前

Slow, low-precision floating point types

Julia
1
2 天前

Python implementations for multi-precision quantization in computer vision and sensor fusion workloads, targeting the XR-NPE Mixed-Precision SIMD Neural Processing Engine. The code includes visual inertial odometry (VIO), object classification, and eye gaze extraction code in FP4, FP8, Posit4, Posit8, and BF16 formats.

Jupyter Notebook
1
3 天前

FP8 dtypes enumeration in python

C++
0
2 年前