Repository navigation

#

nccl

An open collection of methodologies to help with successful training of large language models.

Python
512
2 年前

Best practices & guides on how to write distributed pytorch training code

Python
488
7 个月前

An open collection of implementation tips, tricks and resources for training large language models

Python
481
3 年前

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda
303
1 个月前

Distributed and decentralized training framework for PyTorch over graph

Python
251
1 年前

Federated Learning Utilities and Tools for Experimentation

Python
191
2 年前

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

C++
120
2 年前

N-Ways to Multi-GPU Programming

C
37
2 个月前

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

35
2 年前

A Julia wrapper for the NVIDIA Collective Communications Library.

Julia
29
9 天前

Python Distributed Non Negative Matrix Factorization with custom clustering

Python
24
2 年前

🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹

Shell
21
2 个月前

NCCL Examples from Official NVIDIA NCCL Developer Guide.

CMake
19
7 年前

High performance NCCL plugin for Bagua.

Rust
15
4 年前

🐍 PyCon Korea 2025 Tutorial: vLLM의 OpenAI-Compatible Server 톺아보기 🐍

Shell
10
2 个月前

use ncclSend ncclRecv realize ncclSendrecv ncclGather ncclScatter ncclAlltoall

Cuda
8
4 年前

Summary of call graphs and data structures of NVIDIA Collective Communication Library (NCCL)

D2
7
1 年前

Nvidia NCCL2 Python bindings using ctypes and numba.

Python
6
4 年前