Latest News

Reducing our pager load

Apr 22, 2022 By Lisa Karlin Curtis In Incident.io

At incident.io, we pride ourselves on providing a great product to our customers. We’re also a small team, so we move fast and (whisper it not) occasionally break things. To mitigate the impact on our customers, we have our app set up so that every time our application raises an error (via Sentry), we get paged.

Read Post

Incident.io

Read more about Reducing our pager load

SRE: From Theory to Practice | What's difficult about on-call?

Apr 21, 2022 By Emily Arnott In Blameless

We launched the first episode of a webinar series to tackle one of the major challenges facing organizations: on-call. SRE: From Theory to Practice - What’s difficult about on-call sees Blameless engineers Kurt Andersen and Matt Davis joined by Yvonne Lam, staff software engineer at Kong, and Charles Cary, CEO of Shoreline, for a fireside chat about everything on-call. As software becomes more ubiquitous and necessary in our lives, our standards for reliability grow alongside it.

Read Post

Blameless

Read more about SRE: From Theory to Practice | What's difficult about on-call?

Accelerate AIOps Scalability With New Self-Service Incidents API

Apr 20, 2022 By Stephanie Clegg In BigPanda

BigPanda offers a diverse set of APIs to enterprises looking to move faster and scale incident response workflows seamlessly. APIs are core to automating repeated incident response workflows that enable IT Ops to keep up with the pace of change and innovation agile teams need to thrive. In Q4 of 2021, BigPanda announced the general availability of new self-service APIs including an updated Incidents API.

Read Post

BigPanda

Read more about Accelerate AIOps Scalability With New Self-Service Incidents API

How Well Does Your Infrastructure Support Major Incident Management?

Apr 20, 2022 By xMatters In xMatters

Effective major incident management depends on many things, including planning, precise execution, effective communication, and applying learnings from previous incidents to update those plans. Traditional major incident management wisdom addresses the importance of the remediation process, but it doesn’t speak on the issue of configuring your IT infrastructure.

Read Post

xMatters

Read more about How Well Does Your Infrastructure Support Major Incident Management?

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

Apr 20, 2022 By Jason Montgomery In Blameless

This month I hit my 2-year anniversary with Blameless and as our industry progresses and matures, I thought it would be a good opportunity to look back and review how far we have come and also ruminate on where we’re headed. Our shared vision at Blameless is to help engineering teams adopt reliability practices with ease and advance to a resilient culture.

Read Post

Blameless

Read more about SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

Apr 19, 2022 By Blameless In Blameless

In the latest of an occasional series, today we hear from Kurt Andersen, SRE Architect at Blameless, discussing the evolution of incident management, current trends in site reliability affecting engineering teams, as well as an update on how Blameless is addressing the needs of SRE and DevOps.

Read Post

Blameless

Read more about The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Apr 19, 2022 By Jason Yee In Gremlin

For this episode we’re continuing to “Build Things on Purpose” with JJ Tang, co-founder of Rootly, who joins us to talk about incident response, the tool he’s built, and his many lessons learned from incidents. Rootly is aiming to automate some of the more tedious work around incidents, and keeping that consistency. JJ chats about why he and his co-founder built Rootly, and the problems they’re trying to fix and eliminate when it comes to reliability.

Read Post

Gremlin

Read more about Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Service level objectives: How SLOs have changed the business of observability

Apr 18, 2022 By Grafana Labs Team In Grafana

Forget the latest tech gadgets and the newest products. One of the most talked about trends in observability right now? “SLOs have really become a buzzword, and everyone wants them,” said Grafana Labs principal software engineer Björn “Beorn” Rabenstein on a recent episode of “Grafana’s Big Tent,” our new podcast about people, community, tech, and tools around observability.

Read Post

Grafana

Read more about Service level objectives: How SLOs have changed the business of observability

What's behind BigPanda's customers' success?

Apr 14, 2022 By Chris LaPierre In BigPanda

As the Regional VP of Customer Success for the West and Central Region at BigPanda, Chris LaPierre gets a unique opportunity to see first-hand how BigPanda customers use their AIOps platform. Charged with ensuring every BigPanda customer derives high value and return on investment from the solution, BigPanda’s customer success teams make certain customers leverage the AIOps platform to increase their bottom line.

Read Post

BigPanda

Read more about What's behind BigPanda's customers' success?

Outage Alert: Top 5 Outages of Q1 2022

Apr 14, 2022 By Maddie Welsh In uptime

By now it’s no secret that system outages and website downtime are more widespread and frequent than ever. In fact, the frequency of outages jumped 9% in just the first week of 2022. This can be attributed to a rapid increase in traffic and reliance on tech infrastructures – resulting in connectivity, server, and other technical issues that are alternately unforeseen and unavoidable.

Read Post

uptime

Read more about Outage Alert: Top 5 Outages of Q1 2022

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Reducing our pager load

SRE: From Theory to Practice | What's difficult about on-call?

Accelerate AIOps Scalability With New Self-Service Incidents API

How Well Does Your Infrastructure Support Major Incident Management?

SRE Adoption | A 2-Year Retrospective (From A Business Point-Of-View)

The State of Incidents and Site Reliability: Q&A with Blameless SRE Architect Kurt Andersen

Podcast: Break Things on Purpose | JJ Tang: People, Process, Culture, Tools

Service level objectives: How SLOs have changed the business of observability

What's behind BigPanda's customers' success?

Outage Alert: Top 5 Outages of Q1 2022

Monthly Archive

Follow Us