Making Your On-call and Incident Management Program Stick
Maintenance of your incident management practice is as important as creation - find out what you can do to keep your engineering organization strong and consistent year over year.
The latest News and Information on Service Reliability Engineering and related technologies.
Maintenance of your incident management practice is as important as creation - find out what you can do to keep your engineering organization strong and consistent year over year.
Site Reliability Engineer (SRE) is one of the fastest growing jobs in tech, with Linkedin reporting 34% growth YoY in 2020 and over 9000 openings in their Emerging Jobs Report. If you’re new to SRE and exploring it as a career path, understand that it can be a challenging but rewarding experience. Here are some quick tips on how you can get started with SRE and jump-start a rewarding career.
The Four Golden Signals of monitoring and observability get a lot of things right. But they could be even better.
A look at outages and disruptions to the IT systems that power the Olympics, from 1996 to today.
Although the fundamental concepts of site reliability engineering are the same in any environment, SREs must adapt practices to different technologies, like microservices.
Site Reliability Engineering (SRE) and Operations (Ops) teams heavily rely on notifications. We use them to know what’s going on with application workloads and how applications are performing. Notifications are critical to ensuring SREs and Ops teams can resolve errors and reduce downtime. They’re also crucial when monitoring environments — not only when running in production but also during the dev-test or staging phase.
It's time to break down the silos separating SREs from security engineers.