Repository navigation

#

slurm

Slurm: A Highly Scalable Workload Manager

C
3223
17 小时前

dstack is an open-source control plane for running development, training, and inference jobs on GPUs—across hyperscalers, neoclouds, or on-prem.

Python
1858
17 小时前

Python 3.8+ toolbox for submitting jobs to Slurm

Python
1493
3 个月前

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

Python
914
17 小时前

Python Interface to Slurm

Cython
534
1 个月前

Open source web interface for Slurm HPC & AI clusters

Python
475
1 个月前

Best practices & guides on how to write distributed pytorch training code

Python
467
6 个月前

Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪

Python
392
1 天前

A Slurm cluster using docker-compose

Dockerfile
387
1 个月前

TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.

Python
382
1 天前

Create clusters of VMs on the cloud and configure them with Ansible.

Python
337
2 年前

An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.

YAML
262
21 小时前

Prometheus exporter for performance metrics from Slurm.

Go
257
1 年前

TUI for the Slurm Workload Manager

Rust
191
2 个月前