Latest News

SRE: From Theory to Practice | What's difficult about tech debt?

Aug 4, 2022 By Emily Arnott In Blameless

In episode 3 of From Theory to Practice, Blameless’s Matt Davis and Kurt Andersen were joined by Liz Fong-Jones of Honeycomb.io and Jean Clermont of Flatiron to discuss two words dreaded by every engineer: technical debt. So what is technical debt? Even if you haven’t heard the term, I’m sure you’ve experienced it: parts of your system that are left unfixed or not quite up to par, but no one seems to have the time to work on. ‍

Read Post

Blameless

Read more about SRE: From Theory to Practice | What's difficult about tech debt?

Driving a customer-focused incident response process

Aug 4, 2022 By Martha Lambert In Incident.io

Deep into an incident, Slack firing, up to your ears in decisions, not sure where to turn next? It’s easy for external communication with your customers to fall far down the list of priorities in these moments. However, these are the exact situations where comms are vital, and where underestimating their importance can having damaging and lasting effects on your organisation.

Read Post

Incident.io

Read more about Driving a customer-focused incident response process

New! Common Automated Diagnostics for AWS Users

Aug 3, 2022 By Jake Cohen In PagerDuty

Today’s modern cloud architectures centered on AWS are typically a composite of ~250 AWS services and workflows implemented by over 25,000 SaaS services, house-developed services, and legacy systems. When incidents fire off in these environments—whether or not a company has built out a centralized cloud platform—distinct expertise is often a necessity.

Read Post

PagerDuty

Read more about New! Common Automated Diagnostics for AWS Users

The Do's and Don'ts of Blameless Incident Postmortems

Aug 3, 2022 By xMatters In xMatters

When an incident inevitably occurs, many organizations have a well-prepared incident management team that springs into action. Whether it’s a power outage or security breach, an incident can damage your company’s operations if not handled properly. A strong incident response team is critical to mitigating any negative impacts successfully. Furthermore, once your team resolves the problem, you should initiate a postmortem to detail the incident and record any lessons learned.

Read Post

xMatters

Read more about The Do's and Don'ts of Blameless Incident Postmortems

RESOLVE '22: Incident management automation

Aug 3, 2022 By Ryan Taylor In BigPanda

“Make life easier” isn’t a mantra for the lazy—it’s a way to drill down on important automation in the IT Ops room. When Ryan Taylor, VP of solutions engineering at Transposit, talks about his experience and outlook in the IT Ops chair, people tend to listen.

Read Post

BigPanda

Read more about RESOLVE '22: Incident management automation

Episode 6: Mooving to... Real release strategies with Jake Laverty

Aug 3, 2022 By Richard Whitehead In Moogsoft

Every product or application needs a release strategy. It’s how you can double check that everything in your deployment is appropriately tested, validated and verified. Having a standardized release strategy in place allows your team to follow a protocol and reduce the number of unknowns they must face in the product life cycle. However, there are a few considerations to make this critical process run smoothly.

Read Post

Moogsoft

Read more about Episode 6: Mooving to... Real release strategies with Jake Laverty

Automate incident response workflows with Eventarc and Datadog

Aug 2, 2022 By Thomas Sobolik In Datadog

Eventarc is a Google Cloud offering that ingests and routes events between GCP products, such as Cloud Run, Cloud Functions, and Pub/Sub, making it easy to build automated, event-driven workflows in complex environments. By taking care of event ingestion, delivery, authorization, and error handling, Eventarc reduces the development overhead that is required to build and maintain these workflows and helps you improve application resilience.

Read Post

Datadog

Read more about Automate incident response workflows with Eventarc and Datadog

Tell the story of your incident with timeline curation

Aug 2, 2022 By Martha Lambert In Incident.io

It isn’t the first time you’ve heard us say this and it won’t be the last: getting your post-incident process right is a game-changer. Being able to run effective debriefs and create useful postmortems helps us learn from our mistakes, respond better to future incidents and identify how we can build resilience in our product and teams. In short, it’s the thing the shifts the dial from just “fixing” to actually improving.

Read Post

Incident.io

Read more about Tell the story of your incident with timeline curation

Anti-patterns in Incident Response that you should unlearn

Aug 2, 2022 By Vishal Padghan In Squadcast

It is important to invest time and effort in understanding why a system performs the way it does and how we can improve it. Companies continue with practices that yield successful results, but ignoring anti-patterns can be far worse than choosing rigid processes. In this blog we will explore anti-patterns in incident response and why you should unlearn those.

Read Post

Squadcast

Read more about Anti-patterns in Incident Response that you should unlearn

What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Aug 2, 2022 By Vivian Chan In PagerDuty

Does your team deal with too much noise? Does your heart sink a bit when you think about how much your rulesets have sprawled in order to manage your event processing needs? That’s why we released Event Orchestration earlier this year to help teams reduce the amount of manual work that goes into event management. Event Orchestration is the next evolution of our Event Rules feature set, which helps to route, enrich, and modify events on ingest to remove noise and automate processes.

Read Post

PagerDuty

Read more about What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

SRE: From Theory to Practice | What's difficult about tech debt?

Driving a customer-focused incident response process

New! Common Automated Diagnostics for AWS Users

The Do's and Don'ts of Blameless Incident Postmortems

RESOLVE '22: Incident management automation

Episode 6: Mooving to... Real release strategies with Jake Laverty

Automate incident response workflows with Eventarc and Datadog

Tell the story of your incident with timeline curation

Anti-patterns in Incident Response that you should unlearn

What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today

Monthly Archive

Follow Us