Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Incident Management with Datadog

When your application experiences an outage, the tools your team uses to manage its response can make all the difference in how quickly they resolve the problem and avoid it in the future. An effective incident management workflow depends on accessible, integrated tools as well as clear, direct channels of communication. And, even after the matter’s been resolved, documentation and analysis of an outage is vital to ensuring it never happens again.

Performing Zabbix Alert Correlation and Incident Acceleration with CloudFabrix AIOps

CloudFabrix AIOps 360 solution can ingest alerts, events, metrics and from various monitoring tools to perform event correlation, alert noise reduction and enable incident resolution acceleration. Learn more about CloudFabrix AIOps 360 In this blog I will cover Zabbix integration aspects with our AIOps 360 solution. Zabbix is one of the popular open source monitoring platforms used by many enterprises and MSPs, including some of our customers.

Make Informed Care Decisions With an EHR and Communication Tool Integration

Electronic health records (EHR) are real-time patient health record systems made to securely share patient information with authorized users. Users include those in medical labs, imaging facilities, pharmacies and emergency departments. Essentially, EHRs provide medical information to everyone involved in the patient-care continuum. OnPage continuously explores new ways to expand its value and enhance business processes and workflows to clients.

AIOps Best Practices | First Data/Fiserv: Going Ticketless with AIOps and Moogsoft

At First Data/Fiserv, AIOps dramatically improved incident management and resolution, a transformation that allowed this financial services provider to almost go ticketless. The speakers describe the entire process, started when the CIO called for a global, next-gen monitoring platform. First Data/Fiserv soon realized that Moogsoft’s collaboration and record-keeping capabilities allowed it to slash tickets by 95%. They also describe how the system was fine-tuned to handle both regular and critical incidents transparently.

Nishant Singh shares his thoughts on being an SRE

Nishant Singh is an SRE at LinkedIn based in Bangalore. Currently, he is working towards building and maintaining applications that improve the overall MTTD (Mean time to detect) and MTTR (Mean time to recover) of the site. He likes to build services and play with the latest technologies. Before LinkedIn, Nishant worked for a few companies in the security and e-commerce domain as a DevOps engineer where he was primarily responsible for building infrastructure, deployment pipelines and security.

Network Operations Center Best Practices (in 2020)

Your Network Operations Center (NOC) is responsible for network monitoring, incident response, and other network operations activities — and you want to optimize its performance. To achieve your goal, your NOC team assesses data and explores ways to improve its everyday operations. The team may also implement NOC best practices or craft some of its own. NOC teams manage network availability and performance, along with servers, databases, firewalls, devices, and related external services.

Top Five Reasons Why Companies Are Choosing OnPage Over Competitors

OnPage’s intelligent incident management system is the alerting solution of choice for industry-leading organizations. Since the beginning, companies have invested in the OnPage system for its advanced capabilities, out-of-the-box integrations and unmatched 24/7 customer support. Though we can provide a comprehensive view into OnPage’s competitive advantage, here are the top five reasons why customers continue to trust OnPage’s incident management system.

Telemetry Everywhere: Observability in the DevOps Cosmos

Rockets constantly blast off into space headed towards planets, aiming to create shiny new stars, while meteors whizz by them, threatening their journeys. That’s how global DevOps expert Helen Beal describes the complicated and risky universe of DevOps practitioners and SRE teams. The rockets are these teams’ frequent code releases. Planets represent customers that benefit from the value — stars — created by these launches.

August 2020 Update: Manage service and system categories in the web portal and define responsibilities centrally

Our August update now makes it easy to assign team responsibilities for individual systems through our categories. This is no longer only possible by each team member in the mobile app, but can now also be done centrally in the web portal by the team administrator. All details can be found in this blog article.

The 3 musts for every FinTech incident management pro

Few industries have experienced such a disruptive whiplash as the financial services industry. With the dizzying encroachment of agile, innovative, and fearless fintechs coming to the fore, traditional banking institutions have had to completely rethink their business, revenue models, and customer engagement initiatives.