Operations | Monitoring | ITSM | DevOps | Cloud

3 mistakes I've made at the beginning of an incident (and how not to make them)

The first few minutes of an incident are often the hardest. Tension and adrenaline levels are high, and if you don’t have a well-documented incident management plan in place, mistakes are inevitable. It was actually the years I spent managing incidents without the right tools in those high-tension moments that inspired me to build FireHydrant. I built the tool I wished I’d had when I was trying to move fast at the start of incidents.

Better Data for Public Health: How Nexleaf and PagerDuty are Monitoring Healthcare

Having a reliable power source is something many of us take for granted. It is particularly important for healthcare facilities to have a consistent, reliable power source to ensure that vulnerable patients – specifically those who rely on electricity to sustain their lives – are not disrupted. In rural Sub-Saharan Africa, however, it’s estimated that only about 28% of hospitals have reliable electricity.

Authors' Cut-How Observability Differs from Traditional Monitoring

Remember the old days where if you had an uptime of 99.9 you could be fairly confident everyone was having a good experience with your application? That’s not really how it works anymore. Modern, distributed systems are so complex they typically fail unpredictably, making it much harder to diagnose issues. Traditional monitoring grew out of those early days, allowing you to check the health of simpler systems.

Taking Your Kubernetes Helm Charts to the Next Level

Helm is a deployment tool for Kubernetes objects that supports package management, dependencies, and templating. In this article, we will explore how to optimize your Helm charts. To follow along, you’ll need a basic understanding of Helm and will have ideally written and deployed some basic Helm charts.

Does Your Team Need a Quality Assurance Engineer?

When you develop software solutions, code quality and security are of top importance, and can often define your success or failure. Some teams may require a specialist constantly checking software for bugs and issues, especially when the project is large and unrevealed bugs can have costly consequences. For small development teams or early project development stages, developers may try to work without a quality assurance engineer and test everything themselves.

TL;DR Replication from Edge to Cloud with InfluxDB

Depending on your available resources, data analysis can take place at the edge or in the cloud, but businesses don’t need to choose one location over the other. There are benefits to giving the edge autonomy to collect, process, and act on data locally. Data replication helps maintain edge autonomy and makes it easier for users to get the data they need, where they need it.

Collect More Data with Windows Server Support in Cribl Edge 3.5

Cribl Edge is the easiest and most manageable agent for exploring, processing, and collecting Observability data at the edge for Linux servers. Today, we’re excited to announce that it’s not just Linux admins whose lives have been made easier with Edge. With the Cribl Software Suite 3.5.0, Cribl Edge now supports Windows Server 2016, 2019, and 2022, bringing that same intuitive experience for deploying, setting up, and collecting observability events to your Windows infrastructure.