Repository navigation

#

cutlass

xlite-dev/CUDA-Learn-Notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

Cuda
3494
5 天前

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++
891
5 天前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

248
1 天前

Examples of CUDA implementations by Cutlass CuTe

Makefile
156
3 个月前

CUTLASS and CuTe Examples

Cuda
47
4 个月前

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++
35
2 个月前

GEMM and Winograd based convolutions using CUTLASS

Cuda
26
5 年前

Multiple GEMM operators are constructed with cutlass to support LLM inference.

C++
17
7 个月前

Kernel Library Wheel for SGLang

HTML
8
2 天前

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

Cuda
5
2 个月前

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

3
14 天前