The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
BigPanda gives the Incident Commander award to IT Ops superstars—people who go above and beyond in this high-pressure, critical line of work. In 2021, Ben Narramore, Director of Operations/Service Management at PlayStation was a recipient for his ability to handle high-impact global incidents with exemplary professionalism and skill. Let’s find out what he’s been up to…
Whiskey and Wisdom is a monthly executive-only forum where IT Operations leaders can network independently and discuss high-level AI operations and IT Ops strategies with their industry peers. In our most recent session, the discussion was around justifying AIOps—proving the value the technology brings to the table.
For teams who deploy software to users around the world, every second counts when responding to outages and other incidents. It’s important that you have tools in your arsenal that are up to the challenge. Service monitoring, alerting, collaboration, and visibility are all essential components of a well-implemented incident response plan.
In the IT world, outages and service disruption are a fact of life. Stuff hits the fan… Stuff happens! And it can happen to any service provider – even the most well designed and managed SaaS applications and platforms. One of the reasons why stuff happens is failing to adhere to best practices. To minimize the potential for problems, here we run over some of the key points from the cloud platform management best practice playbook.
SIGNL4 offers powerful duty scheduling and time-based overrides for routing alerts to the right people at the right time. With time-based overrides for example, you can apply different alerting workflows during business hours, weekends, holidays, etc. Holidays in general can bring other requirements for signaling and must also be considered separately when planning shifts. You can add and edit holidays manually in SIGNL4 or you can import them from iCal files.
True reliability takes into account all of the services that exist in your software environment — which is why it can get so complicated. An ecommerce site, for example, might have services that update current inventory in near real time, process payments in the shopping cart, trigger email receipts to send, kick off fulfillment orders, etc. And if one of these services isn’t operating at its best, that can mean money — and in some cases, customers — lost for the company.
Consider what happens if digital apps or services go down. Companies lose revenue, decrease productivity, compromise customer loyalty and the list of repercussions goes on, depending on the business. Indeed, modern business continuity is contingent on a well-functioning suite of consumer and commercial apps and services.