Repository navigation

#

triton

Efficient Triton Kernels for LLM Training

Python
4880
3 天前

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook
1565
1 年前

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda
1342
4 天前

A service for autodiscovery and configuration of applications running in containers

Go
1133
2 年前

Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.

LLVM
826
1 年前

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

662
5 天前

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python
532
1 天前

FlagGems is an operator library for large language models implemented in Triton Language.

Python
493
1 天前

Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series

C
440
4 个月前

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python
336
1 个月前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

248
1 小时前

OpenDILab RL HPC OP Lib, including CUDA and Triton kernel

Python
226
10 个月前

SymGDB - symbolic execution plugin for gdb

Python
216
7 年前

A performance library for machine learning applications.

Python
184
2 年前

NVIDIA-accelerated, deep learned model support for image space object detection

C++
148
2 个月前

(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of services.

Python
137
4 年前