Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

What is expected in the SRE role? We analyzed 30 job postings to find out.

In 2016, Google released the definitive book on Site Reliability Engineering (SRE) - a practice that had originated in the company to take care of a monumental problem - how to keep the Google services running with high reliability. Over the years, SRE has been widely adopted by dev teams across the globe and is a popular role at startups and enterprises alike. Here is a look at how search for SRE has trended over the years.

A Migration That Paid Tech Dividends

TL;DR: Old, deprecated code/infrastructure is a challenge that every engineer will come across. Remedy what you can and remember that some extra effort can go a long way. It can uncover issues that, when addressed, will save you in the future. Part of the challenge of software development is maintaining legacy code and infrastructure. When you ignore or neglect these, issues start to pop up and your reliability suffers, causing pain for your customers. The trick here is to actively steward each project.

DevOps vs. Agile

Curious about the differences between DevOps vs. Agile development methodologies? We'll explore and compare both approaches. What are the key differences between DevOps vs. Agile? Agile and DevOps are methodologies that share the goal of producing software quickly. In DevOps, Development and Operations work together closely throughout the software development lifecycle process. Agile is an iterative approach that focuses on deploying releases rapidly with small teams.

How Lowe's SRE reduced its mean time to recovery (MTTR) by over 80 percent

The stakes of managing Lowes.com have never been higher, and that means spotting, troubleshooting and recovering from incidents as quickly as possible, so that customers can continue to do business on our site. To do that, it’s crucial to have solid incident engineering practices in place. Resolving an incident means mitigating the impact and/or restoring the service to its previous condition.

Essential Tools for Site Reliability Engineers

Site reliability engineers (SREs) are involved in scaling systems and making them reliable and efficient for organizations. But SREs often fail to build system resiliency when they do not have the right tools at their disposal. In this post, we’ll uncover five leading tools that SREs can use to drive the reliability and stability of computing systems. It also examines how SREs can use the tools to improve operations tasks and infrastructure processes.