Repository navigation

#

nccl

An open collection of methodologies to help with successful training of large language models.

Python
506
2 年前

An open collection of implementation tips, tricks and resources for training large language models

Python
478
2 年前

Best practices & guides on how to write distributed pytorch training code

Python
467
6 个月前

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda
288
2 个月前

Distributed and decentralized training framework for PyTorch over graph

Python
255
1 年前

Federated Learning Utilities and Tools for Experimentation

Python
190
2 年前

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

C++
120
2 年前

N-Ways to Multi-GPU Programming

C
37
6 天前

Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.

34
2 年前

A Julia wrapper for the NVIDIA Collective Communications Library.

Julia
28
1 天前

Python Distributed Non Negative Matrix Factorization with custom clustering

Python
24
2 年前

🎹 Instruct.KR 2025 Summer Meetup: 오픈소스 LLM, vLLM으로 Production까지 🎹

Shell
21
18 天前

NCCL Examples from Official NVIDIA NCCL Developer Guide.

CMake
17
7 年前

High performance NCCL plugin for Bagua.

Rust
15
4 年前

🐍 PyCon Korea 2025 Tutorial: vLLM의 OpenAI-Compatible Server 톺아보기 🐍

Shell
9
5 天前

use ncclSend ncclRecv realize ncclSendrecv ncclGather ncclScatter ncclAlltoall

Cuda
8
3 年前

Summary of call graphs and data structures of NVIDIA Collective Communication Library (NCCL)

D2
7
1 年前

Nvidia NCCL2 Python bindings using ctypes and numba.

Python
6
4 年前