Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

The Do's and Don'ts of Blameless Incident Postmortems

When an incident inevitably occurs, many organizations have a well-prepared incident management team that springs into action. Whether it’s a power outage or security breach, an incident can damage your company’s operations if not handled properly. A strong incident response team is critical to mitigating any negative impacts successfully. Furthermore, once your team resolves the problem, you should initiate a postmortem to detail the incident and record any lessons learned.

Sponsored Post

How to Spot the Effects of Alert Fatigue

Imagine being part of an overactive group chat that causes your phone to buzz every few minutes. In the beginning, you open every message but soon realize that most of them aren't important-or at least are not relevant to you. So, what do you do next? Maybe you let the messages pile up and check them later. Or perhaps, you mute the group chat and ignore the incoming messages altogether. You can blame this tendency to ignore or avoid incoming messages or notifications on one culprit: alert fatigue.

Why SREs Need to Embrace Chaos Engineering

Reliability and chaos might seem like opposite ideas. But, as Netflix learned in 2010, introducing a bit of chaos—and carefully measuring the results of that chaos—can be a great recipe for reliability. Although most software is created in a tightly controlled environment and carefully tested before release, the production environment is harsher and much less controlled.

The Improved xMatters Group Experience: Product Feature Updates

We’re constantly looking for new ways to help DevOps, SREs, and operations teams automate operations workflows, secure infrastructure and applications, and rapidly deliver their products at scale. This commitment to our customers — and yours! — led us to redesign the way you experience groups in xMatters.

What It Means to Be an Incident Commander

Leadership is essential in an organization. Establishing a leadership hierarchy helps teams avoid getting confused about who to turn to with questions and concerns, allowing them to focus their efforts where needed. High-quality leadership is vital to success but becomes even more important when the pressure to resolve an issue with minimal downtime is turned up.

Sponsored Post

Best Practices for Communicating with Customers During an Outage

Incidents are unavoidable when running a business. When an incident does inevitably occur, communication is critical while your teams are working to minimize the impact and expedite a solution. For technical resolvers, the first steps during an incident are to look for any leads that point to the source of the issue. Customer service and communications teams, however, must prioritize establishing effective communication with impacted users. Both teams have the right frame of mind, they need to be aligned. This becomes more complicated when such an incident is an outage.

Introducing xMatters New Integration with Everbridge Signal

When Russia invaded Ukraine on February 24, 2022, it sent ripples through many markets. Ukrainian car factories which supplied Europe were interrupted, oil and gas supply from Russia was throttled, and the supplies of steel, sunflowers, corn, and wheat were affected. Prices of sugar and petroleum surged, a threat of long-lasting high inflation emerged, and social unrest began to foment, with cyber-attacks coming both out of and going into Russia.

How To Build an Escalation Policy for Effective Incident Management

Regardless of your organization’s size, industry, or security measures, you will inevitably face IT incidents. But what do you do if an incident affects a critical system and your on-call responders can’t resolve it? Does your team have a set of clearly outlined next steps they should take to handle the issue? Answering these questions can be complicated, even more so for large organizations that rely on cloud-based services to fuel their IT environment.

Sponsored Post

Major Incident Process Is at the Heart of Effectiveness

Read the new white paper on major incident management. Businesses need to be prepared for minor and major incidents to happen to their technologies, be it an integration disconnecting or an entire system being taken offline. Preparation ensure that not only can losses be minimized, but they can protect themselves and potentially their clients from risky impacts.