Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Incident Response with Atlassian's Opsgenie

Learn all about Incident Response with @Atlassian 's Opsgenie. Respond to incidents from the Incident Command Center, identify potential root cause from the Incident Investigation view, and keep track of key information within the Incident Timeline. Once resolved, easily fill out the postmortem template and export to Confluence.

Respond to Incidents Faster with iLert

iLert is an alerting and on-call management solution for ops teams and helps you to respond to incidents faster. It extends monitoring tools such as Icinga with advanced alerting through SMS, phone calls, and push notifications and lets you easily manage on-call duty with schedules and escalations. iLert is a SaaS company based in Germany and has been an integration partner with Icinga for over 5 years. This blog post outlines some of the features by using Icinga along with iLert.

Can Observability Improve IT Ops? BigPanda's Field CTOs have the answer.

A Harrowing Landscape The increasing complexity of modern services is forcing IT Ops teams to employ a growing landscape of disparate tools to monitor the health of their IT Stack. In fact, the number of tools has grown so much in the last few years, that one wonders how IT Ops teams are even able to effectively configure, maintain, ingest, and process all the events that these tools create.

Unraveling Real-Time Health System to Address COVID-19 Challenges

The overarching vision of a real-time health system (RTHS) is to help healthcare delivery organizations (HDOs) move past the complexities of the digital era and align their resources to deliver value to patients, reaping the benefits of a more streamlined and efficient orchestration in the process.

Importance of Operational Data in Incident Context

Network/Security Operations Center (NOC/SOC) engineers and service desk personnel are tasked to process numerous incidents as quickly as possible. However, to resolve an incident they are required to to perform various activities including collecting various operations data including metrics, logs, traces and more from different tools. In many cases, the process also involves coordinating with other IT personnel or creating a war room to bring the incident to closure.

Stay code-connected with 12 new DevOps features

Our mission is to unleash the potential of all teams by harnessing the power of collaboration tools and practices. This is particularly true for teams practicing DevOps, which is all about unlocking collaboration between development, IT operations, and business teams. However, this increased collaboration can come at a cost to developers.

Oncall and COVID-19 Survey Results

One of my concerns as COVID-19 took hold in the US was what the impact on teams that are oncall in tech would be. It can be extremely challenging to be oncall during a “normal” time, and this has been anything but normal. So, I decided to create a survey to learn more about what people’s experiences have been. The survey was conducted from April 8 to April 27, 2020, via a Google Form. It was anonymous and had 141 respondents.

The $5B DevOps Stranglehold

Ten years ago NewRelic, DataDog, Splunk, Dynatrace and SolarWinds built tools we loved to use. They were easy to implement and solved problems quickly and efficiently. Each company was known primarily for a single, well-conceived product. NewRelic’s APM. Splunk’s log file analyzer. DataDog’s server monitor. SolarWinds’ network performance monitor. These companies were beloved by users during the 2000s. Fast forward to 2020 and the world is very different.