Operations | Monitoring | ITSM | DevOps | Cloud

3 mistakes I've made at the beginning of an incident (and how not to make them)

The first few minutes of an incident are often the hardest. Tension and adrenaline levels are high, and if you don’t have a well-documented incident management plan in place, mistakes are inevitable. It was actually the years I spent managing incidents without the right tools in those high-tension moments that inspired me to build FireHydrant. I built the tool I wished I’d had when I was trying to move fast at the start of incidents.

Authors' Cut-How Observability Differs from Traditional Monitoring

Remember the old days where if you had an uptime of 99.9 you could be fairly confident everyone was having a good experience with your application? That’s not really how it works anymore. Modern, distributed systems are so complex they typically fail unpredictably, making it much harder to diagnose issues. Traditional monitoring grew out of those early days, allowing you to check the health of simpler systems.

Taking Your Kubernetes Helm Charts to the Next Level

Helm is a deployment tool for Kubernetes objects that supports package management, dependencies, and templating. In this article, we will explore how to optimize your Helm charts. To follow along, you’ll need a basic understanding of Helm and will have ideally written and deployed some basic Helm charts.

Does Your Team Need a Quality Assurance Engineer?

When you develop software solutions, code quality and security are of top importance, and can often define your success or failure. Some teams may require a specialist constantly checking software for bugs and issues, especially when the project is large and unrevealed bugs can have costly consequences. For small development teams or early project development stages, developers may try to work without a quality assurance engineer and test everything themselves.

TL;DR Replication from Edge to Cloud with InfluxDB

Depending on your available resources, data analysis can take place at the edge or in the cloud, but businesses don’t need to choose one location over the other. There are benefits to giving the edge autonomy to collect, process, and act on data locally. Data replication helps maintain edge autonomy and makes it easier for users to get the data they need, where they need it.

Top-10 Cisco Live 2022 Announcements/Highlights

It was great to be back in person for the Cisco Live 2022 annual conference that happened in Las Vegas from June 12 to June 16, 2022. CloudFabrix is a Cisco solution partner and we had our booth #3645 on the show floor where we showcased our Robotic Data Automation Fabric (RDAF) and how it can help accelerate AIOps and Observability projects. We got a lot of interest from many enterprises, partners, and community members.

Network Basics: What Is SNMP and How Does It Work?

Simple Network Management Protocol (SNMP) is a way for different devices on a network to share information with one another. It allows devices to communicate even if the devices are different hardware and run different software. And despite any rumors you may hear, it’s not going anywhere anytime soon.

What It Means to Be an Incident Commander

Leadership is essential in an organization. Establishing a leadership hierarchy helps teams avoid getting confused about who to turn to with questions and concerns, allowing them to focus their efforts where needed. High-quality leadership is vital to success but becomes even more important when the pressure to resolve an issue with minimal downtime is turned up.