The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
As companies today are racing to build site reliability engineering(SRE) practices within their engineering teams, site reliability engineering has become one of the hottest and highest paying jobs in tech. Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional.
Handover procedures in operations and maintenance are a key element of business continuity. As work in this field is usually organized in shifts, it is essential to keep track of any critical incidents, machine breakdowns, job ownership, completion, issues that are still open or unresolved and other related items. Such knowledge has a significant impact on a timely or even proactive response, for instance if issues re-surface.
On October 7th we released a new Enterprise Alert version, version 8.5.1. Included in this release are the following enhancements.
Sharing information about the health and performance of an application is a critical part of any team’s daily workflow. That’s why we’re excited to announce the Datadog Slack App, which simplifies crucial communication tasks by deepening the integration between Datadog and Slack.
FireHydrant’s Slack integration is a great way to speed up your incident response, especially if FireHydrant Runbooks is automatically creating channels in your Slack workspace for each incident. “But what happens after the incident?” First of all, you shouldn’t have to manually archive those Slack channels; especially when you don’t want them clogging up the Slack navigation bar.
In a world that’s always on, keeping services up and running isn’t just ideal—it’s mission-critical for all of PagerDuty’s customers. It’s not lost on us that serving as the central nervous system for digital operations at some of the world’s largest companies is no small job.
Many organizations are transitioning toward a DevOps operational model, where software developers are responsible for operating the applications they develop, instead of a centralized IT operations group. In this “CTO Perspective” interview we talk to BigPanda’s CTO Elik Eizenberg about the challenges in that transition, and what it takes to make it easier. Lean back and watch the interview, or if you prefer reading, take a few minutes to read the transcript.