Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Best Practices to implement in Incident Management

They are like 5 stages of an incident: 1. Assess impact 2. Inform customers (statuspage) 3. Identify the issue 4. Mitigate the issue 5. Resolve the incident Then there’s followup and further work. Also important to note that (2) should be ongoing as you progress. Updating the status page should be done within reasonable periods – e.g. every 15-20 mins unless you specify otherwise.

Manual Monitors: Everything you need to know

In this post, I will explain what are manual monitors? Manual monitors are monitors that do not actively monitor any resources. You can use them if you are using an external monitoring tool and can ping Fyipe API to create incidents. They can also be helpful to create manual incidents for your customers and show them on status page. Manual monitors can be created in just 2 simple steps.

SLA vs SLI vs SLO: Know the differences between them.

SLA basically means a Service Level Agreement. It’s a formal agreement between you and your customer. It basically describes the reliability of your product/service so you can have a formal agreement which basically says our product will be online 99 percent of the time annually and if we fail to achieve that objective we will give 30% of your annual license fee back. SLA’s also include penalties in the contract.

How to reach 99.99% uptime: High Availability in Practice.

With most businesses finding it hard to achieve a 99.9% uptime throughout the year, achieving a goal of 99.999% uptime looks daunting to developers. Here’s how to reach 99.99% uptime for your business. It’s like asking someone to build a bridge that would never collapse or a machine that would never break down no matter what. In short, it is a hard goal to achieve but yes it is achievable.