Repository navigation

#

ptx

ashvardanian/less_slow.cpp

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++
1837
6 天前
C#
1603
10 小时前

row-major matmul optimization

C++
654
2 年前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

311
18 天前

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

C#
121
3 年前

Free software file format parser for Avid ProTools sessions

C++
80
2 年前

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

C++
55
5 个月前

Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.

Python
40
13 小时前

CUDA kernels in any language supported by LLVM

Rust
29
2 年前

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.

HTML
25
25 天前

GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving

HTML
17
21 天前

Set of examples written for hardware acceleration via TornadoVM

Java
17
6 个月前

Compile Rust into PTX

Rust
14
6 年前

Inline PTX Assembly in CUDA example

Cuda
12
3 年前

Optimizing GPU compiler and database system for NVIDIA hardware

C++
11
3 年前

VeriBlock CUDA PoW Miner

Cuda
9
7 年前

Compile MLIR to PTX and execute it on NVIDIA GPUs

Jupyter Notebook
7
4 个月前

Bloch's equations and Optimal Control for MRI and NMR applications

MATLAB
6
5 年前

bus tracker for Taiwanese CLI

Python
5
4 年前

FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation

Python
5
1 年前