Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

A Day in the Life: James the IT Ops Guy Learns How to Connect All that Data

“Morning, mate,” I greeted Dinesh as he walked into the office. “Nice get up for the big day!” He was wearing a pressed shirt, rather than his usual hoodie. “Thought I’d make an effort, you know,” he grinned. We’d been planning intensely for this moment for the last week or so – our meeting with Charlie, the CIO, to present the results of our Moogsoft experiments and ask for permission to extend the rollout across the enterprise.

SOC 1 or SOC 2, which should you comply with and why?

Organizations today are more vulnerable than ever to cyberattacks and data breaches. Whether the attack is executed by an external actor or an insider, the unauthorized intrusion comes at a great cost. This cost may differ, depending on several factors. These include the cause of the breach, the actions taken to remediate the incident, whether there is a history of data infringements, what data was compromised, and how the organization aligned with the authorities and regulators.

Using Coralogix + StackPulse to Automatically Enrich Alerts and Manage Incidents

Keeping digital services reliable is more important than ever. When something goes wrong in production, on-call teams face significant pressure to identify and resolve the incident quickly – in order to keep customers happy. But it can be difficult to get the right signals to the right person in a timely fashion.

New Splunk Synthetic Monitoring Features Help Integrate Uptime and Performance Across the Entire Splunk Platform

For teams that build or maintain modern applications with their end-users in mind, the acquisition of Rigor means that Splunk now offers the most comprehensive synthetic monitoring solution on the market. Rigor, now Splunk Synthetic Monitoring and Web Optimization, provides best-in-class synthetic monitoring capabilities enabling IT Ops and engineering teams to detect and respond to uptime and performance issues within incident response coordination and throughout software development lifecycles.

Creating Custom Slack Commands

Site Reliability Engineers are expected to know everything that’s happening, all of the time. That’s a lot of things! To help you sift through the noise, we’ve developed a feature that lets you find accurate data about your organization on-demand. You can do this by sending custom-designed commands to FireHydrant directly from your integrated Slack account.

Accelerate Incident Resolution By Benchmarks-enriched On-call Contexts

In a recent experiment with my colleagues, I polled them about the following: “What would they do if the lights went out as you worked at night?” Besides identifying the funny and who-you-want-in-case-of-an-emergency responses, most of my colleagues checked to see if the problem might be broader than their own home.