Network downtime opens the gates to productivity loss and customer attrition and can affect business growth moderately or even severely. Usually, the reasons behind a network outage are the following: Errors in network endpoints: Things like a network bottleneck or a spike in temperature can interrupt a client’s network operations and then snowball into an outage. Operational slip-ups: According to research done by Uptime Institute, 70% of data center and service downtime is due to human error.
The concept of AIOps is simple: Infuse artificial intelligence(AI) into IT to make operations speedier and more efficient. In theory, AIOps at its best should lead to an autonomous IT environment in which functions can run themselves with little or no human intervention. In practicality, the path to this nirvana state is anything but straightforward and raises several questions. Where should you start? How do you measure the value? Is AI ready to scale across production environments?
Add multiple responders to one or more incidents. Helps in gaining empathy, transparency, and future context for an incident that helps reduce MTTR.
A few months ago I wrote about sending notifications to Rocket.Chat. While that messaging tool is quite powerful, one may also prefer to keep it simple. So let’s also address the good old IRC.
In preparation for the upcoming Developer Observability Masterclass we’re hosting at Lightrun with Thoughtworks and RedMonk, I sat down for a brief interview with Tom Granot – the Director of Developer Relations at Lightrun. Tom will MC the event as he did for the Developer Productivity Masterclass we ran back in December.