Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What's All the Fuss About Business Continuity Planning

Digital transformation has created more gateways for vulnerability and risk. So in addition to natural disasters that can impact a business, organizations are faced with cyberattacks that can truly cripple their business. A solid business continuity plan makes sure that your company is ready for whatever may come its way, be it fire, flood, critical technical failure, or a cyberattack.

How No-Code Integrations Help Incident Management Scale

Do you think no-code is just another buzzword that with no real meaning? Well, maybe it is in some contexts. But if you want an example of how no-code solutions can matter in the real world, look no further than the context of incident management. Let us explain by walking through what no-code solutions mean in the context of incident management, how they work and how they can help teams scale and streamline their operations.

2019 Hurricane Season: Solidify a Business Continuity Plan With a Mass Notification Solution

Summer is typically synonymous with beach days, outdoor barbecues and fulfilling weekend getaways. Unfortunately, the summer months aren’t only about enjoyable moments and exciting vacations. It’s also tropical storm season, with higher risks of destruction, community displacement and business operation disruption. With this potential for human and business peril, it’s important for organizations to implement a business continuity plan, equipped with a robust communication strategy.

Best Practices for Managing Multiple On-Call Teams

Alerting has come a long way from the days of paging an on-call administrator in the middle of the night, to multiple on-call teams that run and manage incident response around the clock. This is because as organizations grow and scale, responding to incidents also gets more complex and you often need more than one team to get involved to successfully resolve an incident.

Mark Henderson from Stack Overflow shares his experience on being an SRE

Mark Henderson has been a Site Reliability Engineer at Stack Overflow since 2015. Before this he worked as the sole systems administrator at a small software company in Sydney, Australia. These days, he lives in South Australia and works from home with his wife and two children.

Serverless Event-Driven Workflows with PagerDuty and Amazon EventBridge

This week’s AWS Summit in New York was an exciting one for both AWS and PagerDuty. The AWS team rolled out Amazon EventBridge, a set of APIs for AWS CloudWatch Events that makes it easy for AWS SaaS partners to inject events for their customers to process in AWS. PagerDuty is excited to continue and deepen our long partnership with AWS by supporting EventBridge as a launch partner.

No CMDB? No problem. Not for BigPanda.

I hear it all the time when talking to future BigPanda customers; “I’m not sure BigPanda can really help me correlate all these alerts together because our CMDB is very immature.” Or sometimes, they don’t even have a CMDB, and incorrectly assume this disqualifies them from meaningful noise reduction and alert correlation. I’m happy to tell you the same thing I tell the folks who are looking at BigPanda for the first time. “No CMDB? No problem!”.

Assessing the Per-Minute Cost of an Outage for YOUR Company

Software vendors and analysts love to rattle off scary numbers about how many thousands of dollars per minute or hour an infrastructure outage will cost the typical company. Those numbers can be scary indeed; for example, Gartner quotes $5,400 per minute as the cost borne by a medium to large-sized retailer. Your company, however, is most likely not identical to the “typical” company on which the numbers are based.

July 2019 Update: Alert Opt-In and Out, Apps Section and Getting Started

July 2019 Update introduces the option to opt-out for certain categories as well as some enhancements in the Web portal. You can now opt-in/out of certain categories under Settings -> Services & Systems. This works on a per-user basis and is useful when you do not want to receive certain alerts but your team members still need to get them. Another scenario is to listen in, meaning you see what is going on but all notifications can be muted.