Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Reliability Best Practices: How Gremlin Uses Gremlin

Ensuring software availability is essential for any SaaS company—including Gremlin. To do that, our teams need to identify the reliability risks hiding in our systems. That’s why our development, platform, and SRE teams use Gremlin regularly to perform Chaos Engineering experiments, run reliability tests, and track the reliability of our systems against our standards. Along the way they’ve picked up a thing or two about how to find and fix reliability risks with Gremlin.

Scaling Up to Keep Costs Down: Automation for Web Application Incident Management

Any organization that’s keeping up with today’s sharp rise in business demands (or better yet, getting ahead of the game) is doing so by getting innovative and jumping at the chance to do things differently. They’re not relying on the old ways or trying to use their existing toolbox. Instead, organizations are looking to the newest technologies and means of adding efficiency to as many day-to-day functions as possible.

Understanding and Optimizing CI/CD Pipelines

Building, testing and deploying software is a time-consuming process that many organizations aim to minimize by automating repeatable work wherever possible. To do so, many organizations are utilizing a continuous integration, continuous delivery (CI/CD) philosophy in combination with cloud native tools like Kubernetes to develop and deploy software at scale.

Making a move: How migrating to Ubuntu saved a life insurance company 60% in costs

Balancing high performance operations against the need to reduce total operating costs is a classical dilemma faced by both large and small organisations. This dilemma becomes particularly important when you choose the foundation of your IT infrastructure: the operating system. A recent case study by Tech Mahindra, the multinational IT services and consulting firm, details how their partnership with Canonical enabled them to shift the balance for a major Fortune 500 life insurance company.

The Most Common Ways To Allocate Cloud Spend (+ The Pros And Cons Of Each)

All the major cloud providers allow users to attach business context to their infrastructure in some way. It’s this context that allows users to divide up their cloud bill into more easily digestible bites and keep track of cost trends for different resource types. Thorough cloud cost allocation gives companies the ability to make educated business decisions.

SQL Server Terms Translated into PostgreSQL

The rise in popularity of open-source RBDMs has encouraged many organizations to adopt PostgreSQL, but as a DBA or Developer, it can be challenging when exploring new database platforms, no matter how experienced you are. When looking at SQL Server, it has many similarities to PostgreSQL, but there are several big differences too.

Monitoring Redis Clusters with Prometheus

This article will outline what Redis database monitoring is and how to set up a Redis database monitoring system with MetricFire. Then we’ll show what the final graphs and dashboards look like when displayed on Grafana. We will be using Prometheus and Grafana to power the monitoring, and we'll use a simulated Redis DB to generate the data for the Grafana dashboards. ‍ ‍

Mastering Kubernetes Pod Restarts with kubectl

Managing containerized applications efficiently in the dynamic realm of Kubernetes is essential for smooth deployments and optimal performance. Kubernetes empowers us with powerful orchestration capabilities, enabling seamless scaling and deployment of applications. However, in real-world scenarios, there are situations that necessitate the restarting of Pods, whether to apply configuration changes, recover from failures, or address misbehaving applications.