Repository navigation
kernel-fusion
- Website
- Wikipedia
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
Compile time kernels fusion and expression trees as Alpaka boost.odeint backend. This is my team project developed in collaboration with and under the supervision of HZDR.
High-performance CUDA implementation of LayerNorm for PyTorch achieving 1.46x speedup through kernel fusion. Optimized for large language models (4K-8K hidden dims) with vectorized memory access, warp-level primitives, and mixed precision support. Drop-in replacement for nn.LayerNorm with 25% memory reduction.
Mabor is a cutting-edge deep learning framework built for flexibility, efficiency, and portability—without compromise.