Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Partner Integration - Dynatrace with PagerDuty and Rundeck

Deliver perfect software experiences with real-time intelligence into customer satisfaction and behavior, your applications, and the performance of your hybrid multi-cloud. AI-powered root-cause analysis automatically identifies customer facing performance issues and pinpoints the root-cause within seconds. Open APIs allow ingestion of 3rd party metrics and enable complex system integrations. In this demo, Rob Jahn shares a sophisticated incident remediation workflow incorporating intelligence from Dynatrace, automation in Rundeck, and incidents in PagerDuty.

Building safe-by-default tools in our Go web application

At incident.io, we're acutely aware that we handle incredibly sensitive data on behalf of our customers. Moving fast and breaking things is all well and good, but keeping our customer data safe isn't something we can compromise on. We run incident.io as a multi-tenant application, which means we have a single database (and a single application).

4 Ways To Ensure Reliability of Your Digital Services for GivingTuesday

In today’s digital economy, seconds matter. For mission-driven organizations, seconds can be a matter of life and death, and service reliability can make or break access to suicide and safety hotlines, disaster relief, time-critical health care, food assistance, and more. That’s where real-time digital operations comes in.

Sponsored Post

Using Predictive Analytics Capability to Resolve Critical Incidents

CloudFabrix solution provides a holistic approach for enterprises to implement proactive operations with the objective of eliminating/reducing critical incidents and improving customer satisfaction. The solution primarily relies on applying regression/forecasting models on any time-series data to detect and forecast anomalies. One of the unique features of the solution is the ability to convert unstructured data such as logs/incidents/alerts into time-series data to be used for running prediction models.

Growing pains: the IT Ops maturity model

Modern IT Ops environments have many moving parts that need to work together well, yet are evolving at different speeds. This gap in maturity creates many problems. In this CTO Perspective, Jason Walker, Chief Customer Officer at BigPanda, discusses why IT Ops teams should prioritize maintaining a common maturity across all their IT operations, and how best to do that.

Deploying to production in <5m with our hosted container builder

Fast build times are great, which is why we aim for less than 5m between merging a PR and getting it into production. Not only is waiting on builds a waste of developer time — and an annoying concentration breaker — the speed at which you can deploy new changes has an impact on your shipping velocity. Put simply, you can ship faster and with more confidence when deploying a follow-up fix is a simple, quick change.

Training Intelligent Alert Grouping

Complex incidents are both exhausting and commonplace. In this case, incidents that I am referring to as “complex” are incidents that involve multiple, disparate, notifications in your alert management platform. Perhaps these incidents are logically separated because the underlying systems or services were seen as less coupled than they turned out to be in reality.