Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Timely Delivery with Enterprise Alert

Murphy’s Law states that anything that can go wrong, will go wrong. The challenge for most businesses is putting the right method of communication in place for when the inevitable happens. The only way to handle this is to expect the worst and then prepare for it. A key factor in deciding for any alerting solution is can my team be notified properly when a major outage happens .

Sponsored Post

When Dominoes Fall: Microservices and Distributed Systems need intelligent dataops and AI/ML to stand up tall

As soon as the ITOps technician is ready to grab a cup of coffee, a zing comes along as an alert. Cling after zing, the technician has to respond to so many alerts leading to fatigue. The question is why can’t systems be smart enough to predict bugs and fix them before sending an alert to them. And, imagine what happens when these ITOps personnel have to work with a complex and hybrid cloud of IT systems and applications. They will dive into alert fatigue.

Webinar Recap: Lessons learned from T-mobile Netherlands' road to zero touch

How close can CSPs come to realizing the zero touch network vision, and what are the best next steps for getting there? To discuss this question Anodot brought together a panel of experts, including Kim Larsen, CTIO of T-Mobile Netherlands; Ira Cohen, co-founder of Anodot and the company’s chief data scientist; Fernando Elizalde, analyst at GSMA Intelligence; and moderator Justin Springham.

Chapter Nine: In Which Dinesh Experiments with Chaos Engineering

Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.

Monitoring serverless applications with AWS CloudWatch alarms

Running any application in production assumes reliable monitoring to be in place and serverless applications are no exception. As modern cloud applications get more and more distributed and complex, the challenge of monitoring availability, performance, and cost get increasingly difficult. Unfortunately there isn’t much offered right out of the box from cloud providers.

7 Ways Your Status Page Can Save You

Having a Status Page is like having a dog. A dog alerts you to an incident; sudden noise, approaching neighbor, squirrel… A dog sounds the alarm on an intruder. A dog even alerts you to maintenance by barking at every handyman, garbage truck, and gardener within sight. As a dog fetches the same stick over and over, so does a status page fetch the attention of your users – especially during a live incident – with each browser refresh they wait for the status to change.

How to Reduce Alert Fatigue: Preventing Noisy Alerts and Error Messages

Monitoring solutions are a vital component in managing an application’s environment. From the systems layer all the way up to the end user’s connection to the app, you want to find out how the platform is performing. Indicators like CPU, memory, the number of connections, and overall health help teams make informed decisions for guaranteeing uptime. Teams monitor metrics (short-term information) and logs (long-term information) mainly from a reactive perspective.

How to Notify Your Team of Errors: Email vs. Slack vs. PagerDuty

Site Reliability Engineering (SRE) and Operations (Ops) teams heavily rely on notifications. We use them to know what’s going on with application workloads and how applications are performing. Notifications are critical to ensuring SREs and Ops teams can resolve errors and reduce downtime. They’re also crucial when monitoring environments — not only when running in production but also during the dev-test or staging phase.