Repository navigation

#

SRE

维基百科

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

upgundecha/howtheysre

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

JavaScript
9470
1 个月前

An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)

Go
6211
9 天前
litmuschaos/litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

Go
4897
17 天前

Making on-call suck less for engineers

Python
720
1 年前

This repository includes resources which are more than sufficient to prepare for google interview if you are applying for a software engineer position or a site reliability engineer position

706
3 年前

Curated list of good SRE interview questions.

391
3 年前

A chaos engineering platform for supporting the complete fault drill lifecycle.

Go
321
1 年前

DevOps Happiness: for AI Agents & Humans. Deploy apps and infra to any cloud, in minutes. Fast, simple, cloud-native 🚀

Python
309
1 天前