Operations | Monitoring | ITSM | DevOps | Cloud

Mistakes happen for many reasons #incidentmanagement

In this clip, Dennis Henry of Okta explains why it's important to remember that mistakes happen for several reasons and don't have a single cause. In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response.

Pipeline Talk: Between Two Fernders Edition

Cribl’s co-founders, Clint Sharp, Dritan Bitincka, and Ledion Bitincka, recently took time to host a Between two Fernders edition of Pipeline Talk at the Cribl offices to discuss a wide variety of topics, including Cribl Lake, the N-Gage, WWE aspirations, fishing poles, how CAT6 cabling is not named after actual cats, and wondering if Apple’s iPhone will be a consumer hit (Yes, we know what year it is, but the host clearly doesn’t).

How to build reliable services with unreliable dependencies

In an earlier blog, we looked at slow dependencies and how they can impact the reliability of other services. While we explored what happens when dependencies are degraded, what happens when dependencies outright fail? What can you do when your application or service sends a request to another service, and nothing comes back? We’ll answer this question by using Gremlin to proactively test a service with multiple dependencies.

IRL to IAC: Your Environment to PagerDuty via Terraform

Figuring out how to represent your as-built environment in PagerDuty can be confusing for new users. There are a lot of components to PagerDuty that will help your team be successful managing incidents, integrating with other systems in your environment, running workflows, and using automation. Your organization might have a lot of these components – users, teams, services, integrations, orchestrations, etc.

Making Data Storage More Secure with Progress Flowmon and Veeam Backup and Replication

The new partnership between Progress and Veeam represents a significant step forward in cybersecurity. It marks a considerable advancement in data protection by merging the Flowmon AI-powered threat detection capabilities with the robust backup solution of Veeam. This empowers organizations to more effectively defend their invaluable digital assets.

Lightrun Panel Webinar with Google DORA and Priceline May2024

In this insightful webinar hosted by Lightrun and moderated by Eran Kinsbruner, global head of product marketing and best-selling author in the software development space we delved into the latest developments in software development and performance, focusing on the recent Google DORA report. In the first segment of the webinar, Nathen Harvey and Amanda Lewis from Google Cloud's DORA team provided a comprehensive overview of the latest report's findings, highlighting the emerging emphasis on Performance and Reliability in the industry.

Live event recap: Humanizing the on-call experience

There’s no two ways about it: on-call is stressful. But with humans at the center, it’s especially important to find ways to make it as manageable and empathetic as possible. In this webinar with our friends at ELC, incident.io VP of Engineering, Noberto Lopes, and Intercom Staff Product Engineer, Andrej Blagojević, discuss their own experiences with on-call, and how the process can be better.

Setting up your Grafana k6 performance testing suite: JavaScript tools, shared libraries, and more

Editor’s note: This blog post is the second in a series of posts about organizing your performance testing suite with Grafana k6. If you haven’t already, be sure to check out the first post in the series, which explores how to implement reusable test patterns and other best practices within your testing suite.