Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Holiday Import from iCal Files

SIGNL4 offers powerful duty scheduling and time-based overrides for routing alerts to the right people at the right time. With time-based overrides for example, you can apply different alerting workflows during business hours, weekends, holidays, etc. Holidays in general can bring other requirements for signaling and must also be considered separately when planning shifts. You can add and edit holidays manually in SIGNL4 or you can import them from iCal files.

Understanding Service Level Objectives

True reliability takes into account all of the services that exist in your software environment — which is why it can get so complicated. An ecommerce site, for example, might have services that update current inventory in near real time, process payments in the shopping cart, trigger email receipts to send, kick off fulfillment orders, etc. And if one of these services isn’t operating at its best, that can mean money — and in some cases, customers — lost for the company.

How Sumo SREs manage and monitor SLOs as Code with OpenSLO

At Nobl9’s annual SLOconf—the first conference dedicated to helping SREs quantify the reliability of their applications through service level objectives (SLOs)—Sumo Logic shared our contribution of slogen to the OpenSLO community, as well as our commitment to OpenSLO as an emerging standard for expressing SLOs as Code. slogen is an open source, SLO-as-code CLI tool based on the OpenSLO specification.

5 Greatest Challenges of Effective Incident Management and Tips and Tools on Overcoming Them

Planning for potential security incidents has become a crucial element in every organization’s business strategy in today’s complex landscape of data theft, security breaches, and cybercrime. Surveys revealed 41% of business investors and analysts are becoming increasingly worried about cyber threats. One way for organizations to achieve cybersecurity readiness and instill confidence among stakeholders is to build a robust security incident management plan.

More Tools + More People = Increased Complexity

Consider what happens if digital apps or services go down. Companies lose revenue, decrease productivity, compromise customer loyalty and the list of repercussions goes on, depending on the business. Indeed, modern business continuity is contingent on a well-functioning suite of consumer and commercial apps and services.

Micro Lesson: Troubleshoot an Incident Using Root Cause Explorer

The video uses a scenario to demonstrate how to use Root Cause Explorer to analyse and troubleshoot an incident faster. The video shows how Root Cause Explorer helps you dig deeper into the relevant logs and traces in order to isolate the root cause using various dashboards.

Are your SLOs realistic? How to analyze your risks like an SRE

Setting up Service Level Objectives (SLOs) is one of the foundational tasks of Site Reliability Engineering (SRE) practices, giving the SRE team a target against which to evaluate whether or not a service is running reliably enough. The inverse of your SLO is your error budget — how much unreliability you are willing to tolerate.

Sponsored Post

How to implement a Blameless Postmortem (part two)

This is Part 2 of a two-part series on Blameless Postmortems. The previous article went into why blameless postmortems are so effective; this second part goes into detail on how to build your own postmortem process and kick it into overdrive. Read Part 1 here. So you've read our first installment and recognized the value of the blameless postmortem for efficiency, culture, and output. Now you're ready to get off the blame train and kickstart a blameless postmortem process of your own. Where to begin?

May 2022 Update - Templates, scheduler enhancements, landline numbers, and more

Our May update brings Signl templates for manual alerting, improvements for duty scheduling and various enhancements in the web portal. Another new feature is the possibility to notify through calling landline numbers. All details can be found in this blog article.