Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Open Source Data Lakehouse Architecture with Spark and Kyuubi: Engineering Deep Dive

Subscribe. Fuel your curiosity. This webinar gives a detailed exploration of an open source data lakehouse architecture and how we implement it at Canonical. Watch to discover how Spark’s scalable processing engine and Kyuubi’s user-friendly SQL gateway enable efficient, secure, and high-performance analytics on unified data sets. Let’s dig deeper into how this combination simplifies big data storage, interactive analytics, and ETL – all through a single, streamlined open source lakehouse architecture.

What is Database Change Management (DCM)?

Database change management is the foundation for building a stable, secure, and high-performing application. In today’s fast-paced technological landscape, where agile and DevOps are the go-to for developing database application, rapid releases and continuous iteration are the norms. But with frequent deployments comes the risk of untracked database changes.

The Complete SaaS Unit Economics Guide (2025 Edition)

Measuring and monitoring unit economics can help your SaaS brand make informed business and engineering decisions. But how do you get that data, and what exactly are SaaS unit economics? We’ll cover exactly what SaaS unit economics are, metrics you should monitor, how to calculate your unit economics, and the tools you can use to be successful.

Self-Service Query UI for Logs in Azure Data Explorer (ADX)

This video focuses on how to create a self-service user interface (UI) for querying logs using Azure Data Explorer (ADX) and the Business Activity Monitoring (BAM) module. Perfect for developers and business users aiming to gain actionable operational insights from log data with simple visualizations and monitoring.

IT Alerting: Everything You Need to Know

Behind every reliable service is a team of people watching for problems. But they don’t stare at screens all day. They rely on IT alerting systems. An IT alerting system tells you when something is wrong. It finds problems fast, so your team can fix them before your business or customers are affected. This article will explain everything you need to know about IT alerting. You’ll learn what it is, why you need it, how to set it up, and which tools work best. Table of Contents.

A complete security view for every Ubuntu LTS VM on Azure

Azure’s Update Manager now shows missing Ubuntu Pro updates for all Ubuntu Long-Term Support (LTS) releases: 18.04, 20.04, 22.04 and 24.04. The feature was first introduced for only 18.04 during its move to Expanded Security Maintenance. With this addition, Azure highlights where Ubuntu LTS instances would benefit from Expanded Security Maintenance updates if the administrator attaches an Ubuntu Pro license, even for instances running more recent Ubuntu releases.

Top AI Prompts for Engineering Leaders using the Cortex MCP

AI assistants have transformed how developers work. And now coupled with the Cortex MCP that connects AI assistants directly to live service data, ownership records, and organizational standards, developers can get accurate, context-rich answers about their services and standards right in their IDE. → Tips and prompts for developers using the Cortex MCP But what about engineering leaders?! Your opportunities with AI assistants extend far beyond code generation.

Fix issues faster with Recommended Remediations

You’ve successfully run a Fault Injection test and uncovered a new failure mode before it impacted customers. And the failure could have taken down your whole system if it had happened in production. Now what? Since this is a potential P1 outage, you absolutely need to address the issue, but that’s going to take some time as you dig through the service to track down the problem. Unfortunately, this is a common conflict.

True reliability takes the whole team

Reliability takes the whole team working together. Full transcript:  If you really want to get good at measuring your reliability, then you have to work together as a team. Once your software engineer organization has decided, "We're gonna test these applications to make sure that they have redundancy, availability, resilience." Just stick to that framework that you come up with as a team.