Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Supercharged with AI

One of the most painful parts of incident management is keeping on top of the many things that happen when you’re right in the middle of an incident. From figuring out and communicating what’s happening, to ensuring you learn from previous incidents, and even capturing the right actions – incidents are hard, but they don’t need to be this hard.

Mastering IT Alerting: A Short Guide for DevOps Engineers

$575 million was the cost of a huge IT incident that hit Equifax, one of the largest credit reporting agencies in the U.S. In September 2017, Equifax announced a data breach that impacted approximately 147 million consumers. The breach occurred due to a vulnerability in the Apache Struts web application framework, which Equifax failed to patch in time. This vulnerability allowed hackers to access the company's systems and exfiltrate sensitive data. ‍

Debugging Go compiler performance in a large codebase

As we’ve talked about before, our app is a monolith: all our backend code lives together and gets compiled into a single binary. One of the reasons I prefer monolithic architectures is that they make it much easier to focus on shipping features without having to spend much time thinking about where code should live and how to get all the data you need together quickly. However, I’m not going to claim there aren’t disadvantages too. One of those is compile times.

A New Approach To Incident Management

In recent years, IT departments have faced the challenge of adapting to an evolving landscape of demands. While the primary focus of traditional incident management solutions has been to reduce downtime, it's become clear that just reducing the amount of downtime isn’t sufficient. To truly mitigate the total impact of downtime, there must be a focus on reducing the damage and costs that accumulate while you are down.

Navigating AI in SOC

With notable advancements in Artificial Intelligence (AI) within cybersecurity, the prospect of a fully automated Security Operations Center (SOC) driven by AI is no longer a distant notion. This paradigm shift not only promises accelerated incident response times and a limited blast radius but also transforms the perception of cybersecurity from a deterrent to that of an innovation enabler.

Incident response that's fast and cost-effective: Why 3 companies chose Grafana Cloud

When an incident occurs, every second counts. On-call staff need to quickly get all the relevant information in front of them in a way that’s easy to digest so they can more successfully investigate the issue and communicate with relevant stakeholders.

Downtime Can Affect Anyone : Tired of Hearing "Are You Down"?

Are unexpected downtimes causing headaches for your business? Tired of constantly hearing the dreaded question, "Are you down?" We've got the solution you've been searching for! Introducing StatusCast - your ultimate partner in proactive communication during service outages. Our latest video, "Downtime Can Affect Anyone," sheds light on the impact of unplanned disruptions and the game-changing features that StatusCast brings to the table.

The Unplanned Show, Episode 25: Learning from incidents with Nora Jones

The incident is resolved. The service is restored. Now what? To dig into how teams can learn from incidents and improve resiliency, this episode has author of "Chaos Engineering" (O'Reilly), creator of the "Learning From Incidents" community, and founder of Jeli.io (recently acquired by PagerDuty), the one, the only, Nora Jones.

Discover the Sweet Spot : Offering Five Levels of Component Depth

Indulge in our video "Have Your Cake and Eat it Too: Offering Five Levels of Component Depth." Explore how StatusCast delivers a delectable experience by providing five levels of component depth, allowing you to have complete control over your monitoring and incident management. Discover the sweet spot where efficiency meets customization and learn how StatusCast is revolutionizing the way you handle incidents. Watch now and savor the taste of seamless component management!

Modernize your ITSM with the New PagerDuty Application for ServiceNow

We live in an always-on world, where things move fast and break often. Building stronger resilience is critical for operational efficiency and delivering great customer experiences. CIOs have heavily invested in ITSM solutions, but a centralized, queued approach is no longer meeting the needs of modern organizations when it comes to critical, customer-impacting issues.