Operations | Monitoring | ITSM | DevOps | Cloud

Accelerate change alert discovery and incident resolution with Root Cause Changes

Today, the majority of organizations operate under a hybrid cloud structure. Due to this, operations are consistently met with daily infrastructure and software changes and updates, which are also the primary cause of incidents and outages. Long gone are the days when a tech stack could be represented by a single dependency model. Microservices, CI/CD, and containers across multi-cloud make it extremely difficult to track all the changes and connect them to incidents.

Auto-Instrumenting OpenTelemetry for Kafka

Apache Kafka, born at LinkedIn in 2010, has revolutionized real-time data streaming and has become a staple in many enterprise architectures. As it facilitates seamless processing of vast data volumes in distributed ecosystems, the importance of visibility into its operations has risen substantially. In this blog, we’re setting our sights on the step-by-step deployment of a containerized Kafka cluster, accompanied by a Python application to validate its functionality. The cherry on top?

Joining the Power of AI and Automation: Today's Business-critical Opportunity

Artificial intelligence (AI) won’t fade anytime soon, and since Generative AI (genAI) joined the party in Nov. 2022, innovative business strategies will only get louder. The not-so-fun part of AI and genAI’s growth shows up when businesses resist change and the adoption of emerging technologies. But the truth is – business leaders must step up.

Data Transformation in the BFSI Industry: From Hiccups to High Performance

We’ve been watching the Banking, Financial Services, and Insurance (BFSI) industry’s rapid evolution over the last decade, and so much of it is thanks to advancements in database technologies. They're on a wild digital transformation journey, aiming to boost their operational efficiency, elevate customer experiences, personalize their services, and streamline everything across the industry.

Rootless Containers - A Comprehensive Guide

Containers have gained significant popularity due to their ability to isolate applications from the diverse computing environments they operate in. They offer developers a streamlined approach, enabling them to concentrate on the core application logic and its associated dependencies, all encapsulated within a unified unit.

Scaling in Kubernetes: An Overview

Kubernetes has become the de facto standard for container orchestration, offering powerful features for managing and scaling containerized applications. In this guide, we will explore the various aspects of Kubernetes scaling and explain how to effectively scale your applications using Kubernetes. From understanding the scaling concepts to practical implementation techniques, this guide aims to equip you with the knowledge to leverage Kubernetes scaling capabilities efficiently.

The Limitations Of Combining CloudHealth And Kubecost

Ever since its release in September 2014, Kubernetes has been equally powerful and meme-able in the engineering world. For all the magic of its container orchestration and compute resource management, it’s also mysterious and, to many, confounding — especially when it comes time to pay for it. As we’ve written before, migrating to Kubernetes often means losing cost visibility.

ServiceNow acquires Enable tech to improve health and safety management

I’m excited to announce that ServiceNow has acquired the ToolBox OH&S technology assets of Enable Professional Services, a ServiceNow Elite Partner and Fujitsu company based in Australia. ToolBox OH&S technology—native to the Now Platform—will help accelerate and scale existing health and safety solutions that enhance safety management practices and streamline incident prevention and response processes for both direct and indirect employees.

OpenTelemetry metrics: A guide to Delta vs. Cumulative temporality trade-offs

In OpenTelemetry metrics, there are two temporalities, Delta and Cumulative and the OpenTelemetry community has a good guide on the different trade-offs of each. However, the guide tackles the problem from the SDK end. It does not cover the complexity that arises from the collection pipeline. This post takes that into account and covers the architecture and considerations that are involved end-to-end for picking the temporality.

Clouds, caches and connection conundrums

We recently moved our infrastructure fully into Google Cloud. Most things went very smoothly, but there was one issue we came across last week that just wouldn’t stop cropping up. What follows is a tale of rabbit holes, red herrings, table flips and (eventually) a very satisfying smoking gun. Grab a cuppa, and strap in. Our journey starts, fittingly, with an incident getting declared... 💥🚨