Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

4 Things you Need to Know about Writing Better Production Readiness Checklists

When we think of reliability tools, we may overlook the humble checklist. While tools like SLOs represent the cutting edge of SRE, checklists have been recommended in many industries such as surgery and aviation for almost a century. But checklists owe this long and widespread adoption to their usefulness. Checklists can also help limit errors when deploying code to production. In this blog post, we’ll cover: Production checklists should be holistic.

Graphite Dropping Metrics: MetricFire can Help!

Sometimes a seemingly well-configured and fully-functional monitoring system can malfunction and lose metrics. Subsequently, you get a distorted picture of what is happening with the monitoring object. In this article, we will look at the possible causes of Graphite dropping metrics and how to avoid it. MetricFire specializes in monitoring systems. You can use our product with minimal configuration to gain in-depth insight into your environments.

Application Performance Monitoring: Why is it important for your organization?

Application Performance Monitoring (APM) refers to monitoring or managing the performance of your code, application dependencies, transaction times, & overall user experiences. It is an important technology that ensures the computer application programs are performing as expected. The ultimate goal of performance monitoring is to supply end users with a top quality end-user experience.

An Intro to PromQL: Basic Concepts & Examples

PromQL, short for Prometheus Querying Language, is the main way to query metrics within Prometheus. You can display an expression’s return either as a graph or export it using the HTTP API. PromQL uses three data types: scalars, range vectors, and instant vectors. It also uses strings, but only as literals. This intro will provide basic PromQL examples and concepts to understand as you get used to Prometheus queries.

How Puppet Supports DevOps Workflows in the Windows Ecosystem

For Windows teams that adopt a DevOps approach, augmenting their native toolset (GPO, SCCM, PowerShell) can offer reliable and repeatable processes that successfully affect change. This quick overview highlights how Puppet Enterprise can complement existing Windows tools for better visibility and transparency across the automation processes.

The essential config settings you should use so you won't drop logs in Loki

In this post, we’re going to talk about tips for securing the reliability of Loki’s write path (where Loki ingests logs). More succinctly, how can Loki ensure we don’t lose logs? This is a common starting point for those who have tried out the single binary Loki deployment and decided to build a more production-ready deployment. Now, let’s look at the two tools Loki uses to prevent log loss.

Close the Loop with User Feedback

Everyone’s software crashes. As an engineer, you don’t feel your users’ frustration unless they reach out to customer support, write bad reviews, or tweet about it. This feedback is often lacking relevant information to resolve the issue. In some cases, you can re-engage with the customer, but that process is time-consuming and inefficient. Another option would be to examine the crash reports, but sometimes they don’t give sufficient insight to fix the problem.

Online CNCF event: Why you should use NATS for your next Cloud native application

When building Cloud applications, we often put significant effort into breaking down our monoliths into small code pieces. They are easier to maintain but hard to make them communicate together. This is where NATS comes in. NATS is a simple and highly performant messaging system for Cloud-native apps. In this talk, I will share my experience using NATS at Qovery, why you should or should not use it, and the difference between the well-known RabbitMQ and Kafka.

Three ways tight integration makes logging and monitoring easier

Driving productivity of software development and delivery teams is critical for any organization. The six years of research by DevOps Research and Assessment (DORA) showcases the role easy-to-use tooling plays in driving this productivity and in turn a better work/life balance for the team. The research finds that highest performing teams are 1.5x more likely to have tools they consider easy to use.

Using Let's Encrypt Free Certs with your Linux Servers

Part 2 of our Blog series on certificates focuses on a practical matter: using the free Let’s Encrypt certificates to secure servers that may not be publicly available, but still need better security than self-signed certs can give you. As we explained in our last blog on this subject, to use HTTPS encryption with certificates, you can choose from a number of options.