Repository navigation
checkpointing
- Website
- Wikipedia
A GPipe implementation in PyTorch
An I/O benchmark for deep Learning applications
Cedana: Access and run on compute anywhere in the world, on any provider. Migrate seamlessly between providers, arbitraging price/performance in realtime to maximize pure runtime.
Keras wrapper that autosaves what ModelCheckpoint cannot.
Extending DOLFINx with checkpointing functionality
This FLINK project will consume streams from an azure event-hub and produce to a different event-hub ,and the config files for deploying the same in kubernetes
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
This is a standalone flink producer using for testing the flink-consume-produce-ek repo contents
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
A lightweight checkpointing program written in C.
A shared library to help test your code with failure-injection
DMTCP scripts to get Python scripts working with SLURM.
A digital album face recognition manager, that isolates images of a specified person from a digital album.
Compile a torch model to a checkpointed model
🌀 data objects for Bash (attempt one).
Koo and Toueg’s checkpointing and recovery protocol
A python package for checkpointing, saving, and loading objects.