Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

RetroDuty: How We Scale Continuous Improvement Beyond Engineering at PagerDuty

If you’ve worked on a team that has adopted Agile techniques, you’ve probably heard of a retrospective. If not, here’s the TL;DR: A retrospective is a meeting in which a team connects regularly to reflect on what happens throughout a project and continuously improve how they work moving forward.

Meet Root Cause Changes from BigPanda - IT Ops, NOC and DevOps Teams' Best friend For Supporting Fast-Moving IT Stacks

TL;DR: Fast-moving IT stacks see frequent, long and painful outages. Thousands of changes – planned, unplanned and shadow changes – are one of the main reasons behind this. Until now, IT Ops, NOC & DevOps teams didn’t have an easy way to get a real-time answer to the “What Changed?” question – the answer that can help reduce the duration of outages and incidents in these fast-moving IT stacks. Now, with BigPanda Root Cause Changes, they do.

Rise of the Digital Operations Ecosystem

Many organizations today are dealing today a lot of complexity and disconnected tools. Teams and departments are running in parallel but siloed from each other. People are burned out from a lot of manual work, and everyone is crunched for time. This is not a happy ecosystem to live in. If this digital ecosystem doesn’t work together, your teams don’t know what’s going on and they lack the right information.

Enable SSO and MFA by adding SIGNL4 as an enterprise app in Azure Active Directory

This article describes how SIGNL4 can be generally authorized as an enterprise app for Azure AD users (Marketplace Link). This is important if you want to implement the use of SIGNL4 in your company with existing user accounts from the Azure AD.

Drive continuous improvement with shareable postmortems in Opsgenie

It’s a given that customers expect software and IT services to be high-performing and always on. And, because incidents and downtime will always be a thing, we believe that how you respond can make or break the customer experience. We’ve learned this lesson first hand while refining our own incident management process over the last decade.

It Came From Below

I’m going to assume most people who read this blog are familiar with PagerDuty. But just in case anyone isn’t, PagerDuty is a tool we use in IT to notify us if some predefined check has failed. Maybe a key process has died or maybe we’re not seeing our expected traffic volume or maybe our server has stopped responding to ping. Whatever it is, PagerDuty will relentlessly, remorselessly, and loudly notify whoever is on call that something needs attention.

Extending the Competitive Advantage in Telecom

The telecom industry has always seemed to navigate well through tech changes. As the industry has evolved, it’s managed to transform from landline to mobile carriers, then from voice calls to messaging and data-centric networks. In many developed markets telcos are creating ecosystems for the data-driven economy. The next frontier is shaping up to be one driven by machine learning (ML) and artificial intelligence (AI).