Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Non-Abstract Large System Design (NALSD): The Ultimate Guide

Non-Abstract Large System Design (NALSD) is an approach where intricate systems are crafted with precision and purpose. It holds particular importance for Site Reliability Engineers (SREs) due to its inherent alignment with the core principles and goals of SRE practices. It improves the reliability of systems, allows for scalable architectures, optimizes performance, encourages fault tolerance, streamlines the processes of monitoring and debugging, and enables efficient incident response.

The Unplanned Show, Episode 25: Learning from incidents with Nora Jones

The incident is resolved. The service is restored. Now what? To dig into how teams can learn from incidents and improve resiliency, this episode has author of "Chaos Engineering" (O'Reilly), creator of the "Learning From Incidents" community, and founder of Jeli.io (recently acquired by PagerDuty), the one, the only, Nora Jones.

Navigating AI in SOC

With notable advancements in Artificial Intelligence (AI) within cybersecurity, the prospect of a fully automated Security Operations Center (SOC) driven by AI is no longer a distant notion. This paradigm shift not only promises accelerated incident response times and a limited blast radius but also transforms the perception of cybersecurity from a deterrent to that of an innovation enabler.

Incident response that's fast and cost-effective: Why 3 companies chose Grafana Cloud

When an incident occurs, every second counts. On-call staff need to quickly get all the relevant information in front of them in a way that’s easy to digest so they can more successfully investigate the issue and communicate with relevant stakeholders.

Downtime Can Affect Anyone : Tired of Hearing "Are You Down"?

Are unexpected downtimes causing headaches for your business? Tired of constantly hearing the dreaded question, "Are you down?" We've got the solution you've been searching for! Introducing StatusCast - your ultimate partner in proactive communication during service outages. Our latest video, "Downtime Can Affect Anyone," sheds light on the impact of unplanned disruptions and the game-changing features that StatusCast brings to the table.

11 Best Incident Management Software in 2024

Including Incident Management software in your IT Service Management (ITSM) strategy has become a critical tool for maintaining the seamless operation of business IT systems. This technology isn't just about putting out fires; it's about keeping the digital pulse steady and strong. When IT hiccups occur, this software steps in with a systematic approach to fix it, so that such interruptions don't further interfere with your organization's operations and potentially cause downtime or financial losses.

Discover the Sweet Spot : Offering Five Levels of Component Depth

Indulge in our video "Have Your Cake and Eat it Too: Offering Five Levels of Component Depth." Explore how StatusCast delivers a delectable experience by providing five levels of component depth, allowing you to have complete control over your monitoring and incident management. Discover the sweet spot where efficiency meets customization and learn how StatusCast is revolutionizing the way you handle incidents. Watch now and savor the taste of seamless component management!

Modernize your ITSM with the New PagerDuty Application for ServiceNow

We live in an always-on world, where things move fast and break often. Building stronger resilience is critical for operational efficiency and delivering great customer experiences. CIOs have heavily invested in ITSM solutions, but a centralized, queued approach is no longer meeting the needs of modern organizations when it comes to critical, customer-impacting issues.

Predictions for 2024 - Learn from PagerDuty's CIO and CISO!

Join us as we kick off the year with our leaders discussing their 2024 predictions. Automation and generative AI will continue to play a big role in everything a CIO and CISO does, so come and learn from PagerDuty’s CIO, Eric Johnson and CISO, Heather Hinton, about their top predictions for 2024 and how to best adopt automation and generative AI into your department’s strategies.

How to optimize your cloud infrastructure management

As on-premises infrastructure and workloads increasingly migrate to the cloud, you’ve undoubtedly encountered many challenges in managing complex cloud architectures. These hurdles include juggling cost-efficiency and security to maintain a seamless, high-performance infrastructure. Navigating your cloud infrastructure landscape requires thoroughly understanding its virtualized elements—servers, software, network devices, and storage.