Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Spring 2020 Launch: New Capabilities for a New Digital Era

The ongoing pandemic and resulting economic downturn have led to dramatically changing market conditions. As a consequence, technology teams have become increasingly concerned with the need to minimize their financial risk and reduce costs to mitigate the effects of abruptly pivoting to a fully remote working environment. For some, there has been a struggle to maintain business continuity—i.e., keeping the physical components of the business running when everyone is working from home.

Helicopter Services Company Improves Incident Response by 90 Percent With OnPage BlastIT

Efficient team communication requires the proper set of tools and processes, ensuring that the right persons receive timely messages. This way, recipients are well-informed of a critical issue, while having time to address the incident. Unfortunately, a large helicopter services company relied on time-wasting procedures to communicate with stakeholders, resulting in delayed incident response and resolution.

Business Continuity Planning and Effective Communication - by Laura Toplis

With many companies utilizing remote-working during the COVID-19 pandemic, effective communication is more important than ever. Unfortunately, being in the middle of responding to a global pandemic will not prevent your organization from suffering from other business disruptions. Likely disruptions you may face are: Cyber/ phishing attacks – these attacks can cripple your regular communication methods such as email, or may exploit ineffective communications to extract illegal payments.

Common Operations Problems Solved by OpsRamp Discovery and Monitoring

OpsRamp provides hundreds of out-of-the-box IT infrastructure monitoring templates that capture behavioral and performance metrics for applications, servers, networks, storage, and database instances across hybrid and multi-cloud environments. Combined with powerful AIOps capabilities, modern IT operations teams can leverage both native monitors (pre-built instrumentation for managing IT infrastructure) and custom monitors (user-defined instrumentation for specialized workloads) for proactive IT operations management as a service and responsive troubleshooting.

Integrating dynamic SaaS hosted Uptime Monitoring into your customer-served Applications

Imagine you are rolling out your application to multiple customers, they even might use it on premise. Of course you want to know if your application is running fine and the customer is not experiencing any kind of trouble or downtime - surely you would not want to ship this validation in your own system, as that might also be prone to any kind of error at some point. Which is why you decide to go for a third party uptime monitoring solution e.g. Uptime Monitoring.

OnPage Overrides Silent Switch on iOS and Do Not Disturb Mode

Since its inception, OnPage has been dedicated in providing a powerful critical alerting solution. This mission continues in 2020, as OnPage is pleased to introduce its ability to override the silent switch and Do Not Disturb (DND) mode on iOS. The latest advancements ensure that tasked recipients always receive high-priority, OnPage audible alerts, regardless of their current iPhone settings.

Real-time alerts from Zabbix and escalation with Zenduty

Recently, one of our customers, a 20-member NOC team of a large B2C company, had set up Zabbix to monitor a network of over 1000+ servers, routers, and switches. The NOC team wanted to set up alerting, on-call scheduling, and an escalation matrix whenever a critical network component encountered any downtime. The NOC team used Slack as the primary communication channel and Zoom for real-time communication. For NOC teams like these running a very large operation, setting up alerting can be very tricky.

Tips for Modern NOCs - Correlating Incidents to the IT Changes that Caused Them

Every NOC engineer will tell you that the first thing they look for in an outage is “what changed?”. And they are right to look. While every organization is unique, Gartner reports that on average about 80% of IT incidents today are caused by changes in infrastructure and/or software.

On-call On-boarding Checklist

And it starts with the company culture. Irrespective of how small or large your team is, it’s wise to invest some time in creating a good on-call onboarding plan. A humane on-call is the mark of a good engineering culture. Being on-call means that you’re expected to be reachable for any issues that may occur during your shift. It’s easy to lose any and all motivation by just anxiously anticipating that mid-dinner ping.