Operations | Monitoring | ITSM | DevOps | Cloud

On moving over a million uptime checks per week onto fly.io

The other day, a friend told me about fly.io's nice developer experience (DX). For my day job, I work on improving wrangler2's DX, so naturally it had me curious. I went from "I'll just play around with it, maybe give it a toy workload" to "holy shit, what if I quickly rewrite my business's AWS Lambda + SQS stack to fit entirely within their free tier" in about 90 minutes. It wasn't that simple in the end, but I did manage to migrate most of my active workload from AWS Lambda to fly.io.

Basic Characteristics and Objectives of Corrective Maintenance

There are several types of maintenance used by organizations, some of them are proactive and some are reactive in nature. However, several organizations do not know which type of maintenance will be suitable for their business. In this blog, we will learn about corrective maintenance and the objectives of corrective maintenance. Will this maintenance be good for your business? So, without wasting any time let us begin!

What Is Cost Optimization? 8 Best Practices To Use ASAP

You're not alone if you have trouble visualizing, measuring, and controlling your IT costs. Whether public or private, cloud computing can add to the cost problem. That’s because cloud computing offers on-demand resources, so misconfigured infrastructure, an overzealous engineer, or a blindsided operations team can do something that leads to a surprise cloud bill at the end of the month. But not so fast. What is cost optimization, and why is it different from IT cost reduction?

3 mistakes I've made at the beginning of an incident (and how not to make them)

The first few minutes of an incident are often the hardest. Tension and adrenaline levels are high, and if you don’t have a well-documented incident management plan in place, mistakes are inevitable. It was actually the years I spent managing incidents without the right tools in those high-tension moments that inspired me to build FireHydrant. I built the tool I wished I’d had when I was trying to move fast at the start of incidents.

Better Data for Public Health: How Nexleaf and PagerDuty are Monitoring Healthcare

Having a reliable power source is something many of us take for granted. It is particularly important for healthcare facilities to have a consistent, reliable power source to ensure that vulnerable patients – specifically those who rely on electricity to sustain their lives – are not disrupted. In rural Sub-Saharan Africa, however, it’s estimated that only about 28% of hospitals have reliable electricity.

Authors' Cut-How Observability Differs from Traditional Monitoring

Remember the old days where if you had an uptime of 99.9 you could be fairly confident everyone was having a good experience with your application? That’s not really how it works anymore. Modern, distributed systems are so complex they typically fail unpredictably, making it much harder to diagnose issues. Traditional monitoring grew out of those early days, allowing you to check the health of simpler systems.

Taking Your Kubernetes Helm Charts to the Next Level

Helm is a deployment tool for Kubernetes objects that supports package management, dependencies, and templating. In this article, we will explore how to optimize your Helm charts. To follow along, you’ll need a basic understanding of Helm and will have ideally written and deployed some basic Helm charts.

Does Your Team Need a Quality Assurance Engineer?

When you develop software solutions, code quality and security are of top importance, and can often define your success or failure. Some teams may require a specialist constantly checking software for bugs and issues, especially when the project is large and unrevealed bugs can have costly consequences. For small development teams or early project development stages, developers may try to work without a quality assurance engineer and test everything themselves.