Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

AIOps Best Practices | First Data/Fiserv: Going Ticketless with AIOps and Moogsoft

At First Data/Fiserv, AIOps dramatically improved incident management and resolution, a transformation that allowed this financial services provider to almost go ticketless. The speakers describe the entire process, started when the CIO called for a global, next-gen monitoring platform. First Data/Fiserv soon realized that Moogsoft’s collaboration and record-keeping capabilities allowed it to slash tickets by 95%. They also describe how the system was fine-tuned to handle both regular and critical incidents transparently.

Real-Time Cost Alerts and Forecasts for AWS

For many companies, cloud costs are among the top investments these days. With a growing number of services, instances and regions, cloud cost optimization is becoming increasingly painful. Companies use cloud management platforms to optimize costs and increase cloud visibility and security. But staying on top of AWS budgets requires proficiency, agility and time—especially when any glitch can result in massive cost bleeds.

Network Operations Center Best Practices (in 2020)

Your Network Operations Center (NOC) is responsible for network monitoring, incident response, and other network operations activities — and you want to optimize its performance. To achieve your goal, your NOC team assesses data and explores ways to improve its everyday operations. The team may also implement NOC best practices or craft some of its own. NOC teams manage network availability and performance, along with servers, databases, firewalls, devices, and related external services.

Top Five Reasons Why Companies Are Choosing OnPage Over Competitors

OnPage’s intelligent incident management system is the alerting solution of choice for industry-leading organizations. Since the beginning, companies have invested in the OnPage system for its advanced capabilities, out-of-the-box integrations and unmatched 24/7 customer support. Though we can provide a comprehensive view into OnPage’s competitive advantage, here are the top five reasons why customers continue to trust OnPage’s incident management system.

Using Dynamic Thresholding to Monitor Your Cloud Platforms

Whether you are new to the Cloud, mid-transition, or a professional at cloud or hybrid systems, no one likes being bothered with useless alerts. The options are simple: If you take the approach of ignoring the alert like a bad cold-call, you risk the chance of missing a critical alert and watching your system crash around you. No one likes to open their inbox to a few hundred alerts they have been ignoring.

Telemetry Everywhere: Observability in the DevOps Cosmos

Rockets constantly blast off into space headed towards planets, aiming to create shiny new stars, while meteors whizz by them, threatening their journeys. That’s how global DevOps expert Helen Beal describes the complicated and risky universe of DevOps practitioners and SRE teams. The rockets are these teams’ frequent code releases. Planets represent customers that benefit from the value — stars — created by these launches.

August 2020 Update: Manage service and system categories in the web portal and define responsibilities centrally

Our August update now makes it easy to assign team responsibilities for individual systems through our categories. This is no longer only possible by each team member in the mobile app, but can now also be done centrally in the web portal by the team administrator. All details can be found in this blog article.

On-call compensation models

Providing customers with a world-class and seamless user experience is critical for the success of any business. It is therefore important that you have a robust on-call strategy that optimizes the availability of the right subject matter experts, on-call engineers, and support engineers to resolve critical, user-impacting incidents as soon as possible.