Repository navigation

#

cublas

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++
1752
2 个月前

Python interface to GPU-powered libraries

Python
993
2 年前

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda
459
1 年前

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

312
18 天前

Hooked CUDA-related dynamic libraries by using automated code generation tools.

C
165
2 年前

Deep Learning library using GPU(CUDA/cuBLAS)

Elixir
94
4 年前

Parallel Computing Library for Linux and macOS & NVIDIA CUDA Wrapper

Swift
82
8 年前

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Cuda
63
1 年前

A Deep Learning framework with very few dependencies, Written in Rust

Rust
63
6 个月前

Algorithms implemented in CUDA + resources about GPGPU

Cuda
56
4 年前

Harness the power of GPU acceleration for fusing visual odometry and IMU data with an advanced Unscented Kalman Filter (UKF) implementation. Developed in C++ and utilizing CUDA, cuBLAS, and cuSOLVER, this system offers unparalleled real-time performance in state and covariance estimation for robotics and autonomous system applications.

Cuda
34
1 年前

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

Cuda
34
6 年前

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

C++
32
5 个月前

code for benchmarking GPU performance based on cublasSgemm and cublasHgemm

Cuda
32
3 年前

Examples showing how to utilize the NVML library for GPU monitoring

C++
28
3 年前