Repository navigation

#

arm-neon

C++
21925
11 小时前
microsoft/DirectXMath

DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

C++
1696
1 天前
ashvardanian/SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

C
1453
2 天前

FeatherCNN is a high performance inference engine for convolutional neural networks.

C++
1218
6 年前
C++
413
10 天前

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html

C++
346
1 年前

Heterogeneous Run Time version of Caffe. Added heterogeneous capabilities to the Caffe, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original Caffe architecture which users deploy their applications seamlessly.

C++
269
7 年前

arm neon 相关文档和指令意义

243
6 年前

benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.

Python
204
5 年前

RV: A Unified Region Vectorizer for LLVM

C++
111
3 个月前

Heterogeneous Run Time version of MXNet. Added heterogeneous capabilities to the MXNet, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original MXNet architecture which users deploy their applications seamlessly.

C++
72
8 年前

Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"

C
53
7 个月前

Single Header Quite Fast QOI(Quite OK Image Format) Implementation written in C++20

C++
38
3 个月前

Heterogeneous Run Time version of TensorFlow. Added heterogeneous capabilities to the TensorFlow, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original TensorFlow architecture which users deploy their applications seamlessly.

C++
36
8 年前

NZ1 - NanoZip 1: ultra-fast, dependency-free, portable C compression library optimized for embedded and high-performance use. Full docs: https://ferki-git-creator.github.io/nz1-site/

C
23
6 天前

NEON ARMv8 SHA3_2x: 2 times SHA3 or SHAKE128/256 in 01 call. Use In Post-Quantum Cryptography Submission

C
8
3 年前

Simple neural network microkernels in C accelerated with ARMv8.2-a Neon vector intrinsics.

C
5
2 年前

A low-level C++ Template SIMD Library

C++
4
24 天前

Hardkernel Odroid HC4 Ubuntu 20.04LTS install tutorial & tool build

Shell
3
4 年前