Repository navigation

#

ptx

C#
1494
2 天前
ashvardanian/less_slow.cpp

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++
1036
7 小时前

row-major matmul optimization

C++
624
2 年前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

248
1 小时前

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

C#
117
2 年前

Free software file format parser for Avid ProTools sessions

C++
73
2 年前

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

C++
50
1 个月前

Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.

Python
33
18 天前

CUDA kernels in any language supported by LLVM

Rust
26
2 年前

Set of examples written for hardware acceleration via TornadoVM

Java
16
2 个月前

Compile Rust into PTX

Rust
14
5 年前

Inline PTX Assembly in CUDA example

Cuda
10
3 年前

VeriBlock CUDA PoW Miner

Cuda
9
6 年前

Optimizing GPU compiler and database system for NVIDIA hardware

C++
8
3 年前

Bloch's equations and Optimal Control for MRI and NMR applications

MATLAB
6
4 年前

bus tracker for Taiwanese CLI

Python
5
4 年前

Visual Studio Code extension with PTX assembly syntax support

4
3 年前

FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation

Python
4
1 年前

公共運輸整合資訊流通服務平臺(Public Transport Data eXchange,PTX)的非官方 Golang 用戶端程式庫

Go
3
3 年前

🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....

3
2 年前