Operations | Monitoring | ITSM | DevOps | Cloud

Ownership change of the ansible-collection-icinga to NETWAYS

After NETWAYS has already taken a leading role in the past in maintaining the Ansible Collection Icinga, contributing features and bug fixes, it’s now official: The Ansible Collection Icinga is moving into the NETWAYS namespace (on GitHub and Ansible Galaxy). The people involved in the repository will remain largely the same.

What are Microservices? A Path to Scalability and Agility

If developing scalable, agile applications is a priority for your business, microservices may provide a compelling solution. But what are microservices exactly? The proper microservices definition refers to a modern architectural approach where an application is built as a collection of loosely coupled services. Each service is independent, self-contained, and designed around a specific business capability.

Under the hood: Request coverage feature

‍ The ilert mobile app is primarily used by responders to receive notifications about critical alerts, react to them on the go, and check their current on-call status. It has various capabilities, including critical notifications via push, quick actions for alerts, and critical alert settings. The app enables responders to view their current on-call shifts and escalation policies, take on-call shifts from somebody else, and create coverage requests to ask for on-call shift handover from a colleague.

Surprised By Your AWS ELB Bill? Here's What Happened

On May 1st, AWS corrected a long-standing billing bug tied to Elastic Load Balancer (ELB) data transfers between Availability Zones (AZs) and regions. That fix triggered a noticeable increase in charges for many users, especially for those with high traffic volumes or distributed architectures. The problem wasn’t new usage; it was a silent correction to an old error.

VPC Log Format: Custom and Advanced Configurations

VPC Flow Logs come with a default format that gives you basic network traffic details. But you can tweak the format to capture exactly what you need. This can lower costs, speed up processing, and make your logs fit better with what you’re trying to monitor. If you want to improve security, keep an eye on performance, or save money, adjusting your VPC logs can make a big difference. Let’s take a look at some practical ways to customize your logs beyond the default settings.

A Simple Guide to Monitoring and Optimizing Prometheus CPU Usage

Prometheus is supposed to help you monitor your stack, not become the thing you need to monitor. But if you’ve ever seen it spike in CPU and slow everything down, you know that’s not always the case. High Prometheus CPU usage usually shows up when you're scraping too many metrics, using expensive queries, or running with default configs that don’t fit your workload. This guide covers how to track Prometheus CPU usage, what typically causes it, and how to fix it.

PagerDuty + Microsoft Build 2025: Transforming critical work with AI and automation

At Microsoft Build 2025, PagerDuty was featured in key announcements showcasing how intelligent agents and real-time automation redefine digital operations. From Microsoft Copilot to the launch of a new Azure SRE Agent, PagerDuty was highlighted as a strategic partner in enabling intelligent, scalable incident response.

SAML authentication in Grafana Cloud: a guide for easy configuration

In my role as Senior Observability Architect here at Grafana Labs, one of the things I focus on is making sure customers are getting the most out of our products. Recently, I noticed a trend where customers were struggling to get SAML authentication configured properly. They were getting stuck on some of the steps needed to configure the users key pair values, which allows users to log in with the correct roles assigned in Grafana.

Harnessing Network Observability to Enhance Grid Resilience

Within the utility sector, a lot is changing. Utilities continue to pursue digital transformation, altering the way services are delivered and operations are managed. What hasn’t changed is the criticality of the services provided. These organizations deliver essential resources like natural gas, electricity, and water—services that we as consumers rely upon constantly for our comfort, sustenance, communications, and more.

Preparing for the Autonomous Future

Throughout this blog series, we’ve followed how AI reshapes network operations – from foundational data harmonization to real-time correlation, from contextual insights to agent-driven automation, and most recently, to conversational access through natural language interfaces. But we haven’t reached the final destination.