Repository navigation

#

kernel-fusion

Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

Rust
12628
37 分钟前

An efficient concurrent graph processing system

C++
46
4 年前

GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving

HTML
17
21 天前

GPU fusion code and algorithm

Cuda
1
1 年前

Compile time kernels fusion and expression trees as Alpaka boost.odeint backend. This is my team project developed in collaboration with and under the supervision of HZDR.

C++
1
1 年前

High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.

Python
0
2 天前

Mabor is a cutting-edge deep learning framework built for flexibility, efficiency, and portability—without compromise.

Rust
0
1 天前