Repository navigation
cuda-programming
- Website
- Wikipedia
A General-purpose Task-parallel Programming System using Modern C++
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Sample codes for my CUDA programming book
Safe rust wrapper around CUDA toolkit
TinyChatEngine: On-Device LLM Inference Library
Thin, unified, C++-flavored wrappers for the CUDA APIs
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
A self-learning tutorail for CUDA High Performance Programing.
A simple GPU hash table implemented in CUDA using lock free techniques
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
An implementation of HIP that works on CPUs, across OSes.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
CUDA kernel author's tools
Speed up image preprocess with cuda when handle image or tensorrt inference