Repository navigation

#

cutlass

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++
1067
2 天前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

311
18 天前

Examples of CUDA implementations by Cutlass CuTe

Makefile
219
2 个月前

CUTLASS and CuTe Examples

Cuda
71
1 个月前

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++
40
6 个月前

GEMM and Winograd based convolutions using CUTLASS

Cuda
26
5 年前

Multiple GEMM operators are constructed with cutlass to support LLM inference.

C++
19
17 天前

Kernel Library Wheel for SGLang

HTML
11
2 天前

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

Cuda
10
6 个月前

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

3
1 个月前

Lightweight and production level C++ Open source Library

C++
0
3 个月前

This repository showcases common optimization techniques for kernels.

Cuda
0
2 个月前