Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Modern IT Systems Have Outgrown Traditional Monitoring

Legacy monitoring tools fall short for SRE teams and DevOps pros tasked with maintaining uptime of key applications in modern, cloud-based IT systems. To have visibility and control over these environments, these teams must collect and analyze more granular, underlying system information — observability data. This article explains why the only way for SRE teams and DevOps pros to extract the necessary insights from this data is through the application of AI capabilities.

The rise of 'Compliance-ops': Bridging the tech and compliance gap in iGaming

Kimberley Wadsworth gambled £36,000 in a fortnight, committing suicide shortly after the loss and leaving her mother homeless as a result. Kimberley Wadsworth started gambling in 2015, visiting brick-and-mortar shops and playing at online casinos. There was no one to promptly alert or save Kimberly from her dreadful destiny.

Difference between a team lead and an engineering manager and how to transition between these roles

Transitioning from a team lead role to an engineering manager role is tough and you will experience many changes when transitioning between these two roles. What happens when you become an engineering manager?

Stuff Happens: How Slack and PagerDuty Work Together to Resolve Incidents Quickly

Like death and taxes, IT incidents are inevitable. Issues like server outages and broken code are common—and costly. A single hour of downtime costs businesses more than $300,000 on average, according to Gartner. That’s why a solid incident management strategy is a must for any organization. “People solve incidents, but we can’t do it alone,” says Ali Rayl, Slack’s vice president of customer experience.

Why you need to stop the handover of that shared on-call duty phone

If you are still handing over a shared on-call duty phone or pager (sometimes called ‘operations phone’), it is time to rethink your process. The Covid19-induced new normal has a dramatic impact on our work live and social behavior. We work from home and that is especially true for the IT workforce. We meet with less people and limit our social network to relatives and close friends.

New Features: Heartbeat Monitoring, Incident Actions, Suggested Responders, Incident Re-routing

You might have noticed that we’ve added a new type of alert source a few months ago - Heartbeat alert sources: A Heartbeat alert source expects a signal (the “heartbeat” ping) at regular intervals and alerts you, if it doesn’t receive a ping within the specified interval.

Introducing New Technology to Skeptical Care Providers

In the following years, U.S. industries are poised to experience a changing of the guard. The majority of baby boomers will retire in the next decade. Their roles will be taken over by millennials (Generation Y), a digitally native generation that is familiar with modern technology. Generation Y must develop empathy and prepare for the challenge of bringing tech disruption to the workplace. Millennials must introduce new technologies, without intensifying the anxiety of skeptical care providers.

This is your Guide for Implementing SRE in NOCs

Network Operation Centers, or NOCs, serve as hubs for monitoring and incident response. A NOC is usually a physical location in an organization. NOC operators sit at a central desk with screens showing current service data. But, the functionality of a NOC can be distributed. Some organizations build virtual NOCs. These can be staffed fully remotely. This allows for distributed teams and follow-the-sun rotations. NOC as a service is another structure gaining in popularity.