Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How PagerDuty Helps Manage Hybrid Infrastructure and Complex Ops Across Industries

If there’s one thing we learned from the 80+ sessions from Summit 2021, it’s that across the industries, companies are continuing to accelerate innovation in a bid to meet growing customer expectations of always-on services across all channels. In financial services, disrupting traditional banking or rethinking access to advisory services comes with operational and regulatory challenges.

Contextual Intelligence and Observability: Without the Former, You Really Don't Have the Latter

Observability is a hot term in the industry, but don’t let it fool you: having visibility into your organization's apps and services only gives you partial clarity into a system’s overall performance. To get a full understanding of your monitoring data, you need to apply contextual intelligence.

New Product Integration! Microsoft Teams Video

On the heels of our Microsoft Teams integration release to streamline incident management, we’re excited to share that we now support Microsoft Teams Video capabilities. We generate Microsoft Teams video conference links for each Blameless incident for fast and easy collaboration. Microsoft Teams Video joins Zoom, Google Meet, and GoToMeeting in our video integration suite.

Less is more: Incident management and monitoring in hybrid IT infrastructures

Many companies are continuously modernizing their infrastructure – but there is no standard way for the perfect IT infrastructure. Still, hybrid architectures have become the status quo in enterprises. Almost all organizations have migrated at least parts of their assets to the cloud or run applications as cloud services. At the same time, businesses want to dovetail their IT architecture with software development and are therefore embracing dynamic infrastructures. ‍

Resilience in Action E9: Vulnerability, Compassion, and Post-Incident Reviews in the Emergency Room with Dr. Al'ai Alvarez

‍ What can software engineers learn from post-incident reviews that physicians do in the emergency room? In our ninth episode, Christina, member of the Blameless strategy team, guest-hosts the podcast to interview both Kurt Andersen and Al'ai Alvarez, MD (@alvarezzzy). Dr. Alvarez is an assistant clinical professor of Emergency Medicine at Stanford. Clinically, he’s an emergency physician.

What is Incident Management in IT and Why does it matter?

Incident management is the process of identifying and resolving problems that occur in IT services. Incident Management is also used as a metric to measure the health of the IT Service Desk. Let’s discuss what incident management is, why it matters to your business, and how you can apply it to your organization.

Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.

Most frequently asked questions surrounding Google's Cloud Operations Sandbox

Cloud Operations Sandbox serves as a simulation tool for budding SREs to learn the best practices from Google and apply them to real cloud services. In this blog, we have compiled a list of FAQs surrounding the use of Google's Cloud Operations Sandbox. The Google SRE sandbox provides an easy way to get started with the core skills you need to become a SRE.