Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

2021 is the Year of Reliability

There’s no better time than now to dedicate effort to reliable software. If it wasn’t apparent before, this past year has made it more evident than ever: People expect their software tools to work every time, all the time. The shift in the way end-users think about software was as inevitable as our daily applications entered our lives, almost like water and electricity entered our homes.

The Secret of Communicating Incident Retrospectives

In the world of SRE, incidents are unplanned investments in reliability. Why? Because they are valuable opportunities to learn and grow. This perspective can be difficult to communicate to other stakeholders. Some may be upset about the cost incurred or the affected customers. Others might not understand why incidents happen in the first place. It is important to show how the lessons of an incident are relevant to each stakeholder role.

How to collect HAProxy metrics

This article is a full tutorial on HAProxy monitoring and the best tools to get it done right. We will be looking into how to collect HAProxy metrics using a collectd daemon, push them into Graphite and visualize them in Grafana. To follow the steps in this blog, sign up for the MetricFire free trial, where you can use Graphite and Grafana directly in our platform.

I used Rust in production for 6 months! Here's my feedback

Are you in two minds when it comes to learning new programming languages? Probably you may feel the same when you first heard about Rust programming language. Good things require some effort and here's what I have to say after using Rust programming language in production for a 6-month duration – It is great and Simply superb! Let's get the clear practical experience picture with Rust at Qovery.

Kubernetes right-sizing at the container level for fine-tuned application efficiency

Spot by NetApp’s Ocean continuously optimizes Kubernetes clusters with a wide feature-set tackling different aspects of running and managing Kubernetes containers in a cloud environment. One such aspect are the container resource requests defined in the cluster (upon which Ocean intelligently bin-packs pods on the underlying cloud VMs). Incorrect assumptions regarding the CPU and Memory required for an application, can incur unnecessary and costly cloud infrastructure waste.

Discord Bot Part 1: Getting started the right way

I’ve recently started working on a new project to build a Discord bot in Go, mostly as a way to learn more Go but also so I can use it to manage various things in Azure and potentially elsewhere. I figured it’d be useful to document some of this project to give some insights as to what I’ve done and why. First up was setting up the CI/CD pipeline for it so that I don’t need to worry about it later and can save myself a bunch of time when testing.

Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

Downtime costs more than dollars. It also costs customer happiness and trust. So how do teams maximize for reliability while scaling? Tooling, communication, observability, and more all play into a complete reliability strategy. In a recent industry leaders’ roundtable hosted by Blameless, top experts discussed best practices for responding to incidents, scaling for reliability, and how to engineer with the customer in mind.

Five Network Considerations For Remote Working

Many businesses put temporary measures in place last year to support remote working. With the shift to remote working appearing more long-term, businesses are now starting to think more strategically about how their networks can support a virtual workforce. Here we look at five network considerations to support your virtual workforce…