Repository navigation
#
cutlass
- Website
- Wikipedia
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Cuda
3494
5 天前
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++
891
5 天前
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
248
1 天前
Makefile
156
3 个月前
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
C++
35
2 个月前
GEMM and Winograd based convolutions using CUTLASS
Cuda
26
5 年前
A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.
Cuda
5
2 个月前
pytorch implements block sparse
C++
1
2 年前