Repository navigation
CUDA

CUDA® 是由英伟达公司推出的并行计算平台和编程模型。借助CUDA,开发人员可利用 GPU 的处理能力,大幅提升计算性能。
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Build and run Docker containers leveraging NVIDIA GPUs
Instant neural graphics primitives: lightning fast NeRF and more
kaldi-asr/kaldi is the official location of the Kaldi project.
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
Open3D: A Modern Library for 3D Data Processing
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Containers for machine learning
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉