Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How We Define SRE Work

At the time of writing this post, I have officially been at Honeycomb for one year as a site reliability engineer (SRE). I had shared my initial experiences and impressions in this post and thought it would make sense to check back in now that I’ve had the opportunity to spend time learning about the team, the culture, and the code base more in depth.

Exploring the Importance of Change Management in Healthcare

Change management is an organized, structured approach with methods that enable healthcare organizations to transform workflows seamlessly. Organizational change management requires the collective involvement of C-level executives and stakeholders to successfully implement changes within a care facility. Change is required when individuals, processes, teams, and tools cannot keep pace with the ever-changing needs and expectations of the organization.

Improved routing for Jira Cloud and Jira Server tickets with multi-project support

If you love Jira then you probably love customization, and we’ve made your integration with Jira Cloud and Jira Server even better with multi-project support! You can now route your incident tickets and follow-up work to remediation teams' Jira projects directly from FireHydrant, saving you valuable time and clean-up work. Let’s take a look at what has changed and some additional use cases unlocked with this integration.

New Native Slack functionality from PagerDuty - Available Now

At PagerDuty we invest a significant part of our time listening to our customers. From what we have learned from those conversations we are adding a new set of features to our Slack Integration. These features will make leveraging PagerDuty from Slack even more seamless and allow Incident Responders to conduct their work without switching context, expediting response times, and ultimately maintaining high customer satisfaction.

The three pillars of great incident response

There’s no one-size-fits-all incident response process. Depending on your organisation’s shape and size, you’ll have different requirements and priorities. But the same three pillars form the core of any good process, whether it’s for the largest e-commerce giant or a scrappy SaaS startup.

It's not ready for production until it has an Operational Readiness Checklist

Maintaining the reliability of complex services just got easier with Operational Readiness Checklists. Service owners and engineering leaders can now evaluate and maintain the production readiness of the services their users rely on every day: spot risks in your service dependencies before they cause incidents, and respond quickly if they do. Before you put a new service into production, readiness checklists help you dot-your-is and cross-your-ts.