Repository navigation
#
cutlass
- Website
- Wikipedia
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++
1067
2 天前
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
311
18 天前
Makefile
219
2 个月前
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
C++
40
6 个月前
GEMM and Winograd based convolutions using CUTLASS
Cuda
26
5 年前
A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.
Cuda
10
6 个月前
pytorch implements block sparse
C++
1
2 年前
C++
0
3 个月前