Repository navigation

#

arm-neon

C++
21330
1 小时前
microsoft/DirectXMath

DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

C++
1637
10 天前
ashvardanian/SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

C
1328
19 天前

FeatherCNN is a high performance inference engine for convolutional neural networks.

C++
1217
6 年前
C++
365
11 天前

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html

C++
339
1 年前

Heterogeneous Run Time version of Caffe. Added heterogeneous capabilities to the Caffe, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original Caffe architecture which users deploy their applications seamlessly.

C++
268
7 年前

arm neon 相关文档和指令意义

241
6 年前

benchmark for embededded-ai deep learning inference engines, such as NCNN / TNN / MNN / TensorFlow Lite etc.

Python
204
4 年前

RV: A Unified Region Vectorizer for LLVM

C++
107
3 个月前

Heterogeneous Run Time version of MXNet. Added heterogeneous capabilities to the MXNet, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original MXNet architecture which users deploy their applications seamlessly.

C++
72
7 年前

Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"

C
52
3 个月前

Single Header Quite Fast QOI(Quite OK Image Format) Implementation written in C++20

C++
36
1 个月前

Heterogeneous Run Time version of TensorFlow. Added heterogeneous capabilities to the TensorFlow, uses heterogeneous computing infrastructure framework to speed up Deep Learning on Arm-based heterogeneous embedded platform. It also retains all the features of the original TensorFlow architecture which users deploy their applications seamlessly.

C++
36
7 年前

NEON ARMv8 SHA3_2x: 2 times SHA3 or SHAKE128/256 in 01 call. Use In Post-Quantum Cryptography Submission

C
7
3 年前

Simple neural network microkernels in C accelerated with ARMv8.2-a Neon vector intrinsics.

C
4
1 年前

Hardkernel Odroid HC4 Ubuntu 20.04LTS install tutorial & tool build

Shell
3
4 年前

A low-level C++ Template SIMD Library

C++
3
2 个月前

Colorful Mandelbrot set renderer in C# + OpenGL + ARM NEON

C#
3
1 年前