Repository navigation

cutlass

Website
Wikipedia

xlite-dev / CUDA-Learn-Notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

CUDA gemm cuda-kernels cuda-programming cudnn cutlass flash-attention

Cuda

3494

377

5 天前

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

cutlass PyTorch CUDA gpu

C++

891

5 天前

coderonion / awesome-cuda-and-hpc

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

CUDA cublas tensorrt Awesome Lists 大语言模型 gpu blas PyTorch hpc gemm llama cudnn triton tensorrt-llm cutlass mlir tvm deepseek ptx vlm

248

1 天前

DD-DuDa / Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

CUDA cutlass gpu

Makefile

156

3 个月前

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

CUDA cutlass Docker

Cuda

4 个月前

Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

CUDA flash-attention gpu inference large-language-model 大语言模型 Nvidia cutlass mha

C++

2 个月前

YashasSamaga / ConvolutionBuildingBlocks

GEMM and Winograd based convolutions using CUTLASS

深度学习 convolution CUDA cutlass

Cuda

5 年前

yester31 / Cutlass_EX

study of cutlass

cmake C++CUDA cutlass parallel-programming

Cuda

5 个月前

Bruce-Lee-LY / cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

cublas cutlass 大语言模型 Nvidia gemm gpu

C++

7 个月前

sgl-project / whl

Kernel Library Wheel for SGLang

CUDA cutlass

HTML

2 天前

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cute cutlass tensorrt feature-matching CUDA flash-attention multihead-attention transformer superpoint

Cuda

2 个月前

cjmcv / ai-infra-notes

Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

CUDA cutlass hpc inference 大语言模型 mlsys simd gpu

14 天前

digital-nomad-cheng / tvm_project_course

编译器 CUDA cutlass 神经网络 tensorrt tvm

Python

1 年前

Routhleck / blocksparse-pytorch-implement

pytorch implements block sparse

CUDA cutlass matrix-multiplication Python PyTorch

C++

2 年前

prateekshukla1108 / cutlass3

Docs

cutlass

HTML

7 天前