Repository navigation

#

cuda-kernels

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C
7317
1 个月前
Python
6136
2 天前

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

Rust
4273
2 天前
xlite-dev/CUDA-Learn-Notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

Cuda
3487
4 天前

CUDA Kernel Benchmarking Library

Cuda
620
4 天前

Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.

C++
347
3 年前

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

C++
217
3 年前

Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.

Cuda
183
5 天前

Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your function in .NET and Amplifier will take care of running it on your favorite hardware.

C#
178
10 天前

Some CUDA design patterns and a bit of template magic for CUDA

C++
150
2 年前

Spiking Neural Networks in C++ with strong GPU acceleration through CUDA

Cuda
127
5 年前

Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research

C++
110
2 年前

Triton implementation of FlashAttention2 that adds Custom Masks.

Python
109
8 个月前

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda
105
9 个月前

TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.

Cuda
81
9 天前