Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Announcing HAProxy ALOHA 17.0

HAProxy ALOHA 17.0 is now available, delivering powerful new features that improve UDP load balancing, simplify network management, and enhance performance. With this release, we’re introducing the new UDP Module and extending network management to the Data Plane API, a new API-based approach to network configuration. The Network Management CLI is enhanced with exit status codes and contextual help.

Rethinking WhatsApp Alerts - A Data-Driven Approach

WhatsApp has become a major alerting channel for incident response teams. It's popular and for many, a great alternative to SMS. In our 2024 recap, we mentioned how Spike sent over 25,000 alerts on WhatsApp. It is now the 2nd most used alert channel for responders on Spike (rising from 4th spot in 2023). But... I will be the first one to admit – the WhatsApp alerts experience needed work to help responders react to incidents quicker!

Proactive Monitoring: How DinoCloud Uses CloudWatch to Save Clients Money

At MetricFire, we love talking with engineers about their tech stacks, SRE challenges, and how they approach infrastructure monitoring. Recently, we had a great chat with Yoimer Roman from DinoCloud, a Latin American company that helps clients make smarter business decisions by leveraging AWS CloudWatch monitoring. Yoimer wears many hats: mentoring his team on all things AWS, designing custom cloud environments, and bridging the gap between technical challenges and non-technical stakeholders.

Using CircleCI to test and deploy Python serverless functions on Microsoft Azure

Serverless computing simplifies app development by abstracting away server management. Azure Functions provides a robust platform for event-driven, on-demand code execution. In this tutorial, we’ll create and deploy a Python-based Azure Function—one that parses incoming JSON—using CircleCI. For a more granular and enable programmatic access to Azure resources, we’ll use service principal for secure authentication and the Azure CLI orb to streamline our CI/CD pipeline.

Unlocking Edge AI: a collaborative reference architecture with NVIDIA

The world of edge AI is rapidly transforming how devices and data centers work together. Imagine healthcare tools powered by AI, or self-driving vehicles making real-time decisions. These advancements rely on bringing AI directly to edge devices. However, building a robust architecture for diverse edge environments presents significant hurdles. This blog introduces our new reference architecture, designed to simplify edge AI deployment.

Building optimized LLM chatbots with Canonical and NVIDIA

The landscape of generative AI is rapidly evolving, and building robust, scalable large language model (LLM) applications is becoming a critical need for many organizations. Canonical, in collaboration with NVIDIA, is excited to introduce a reference architecture designed to streamline and optimize the creation of powerful LLM chatbots. This solution leverages the latest NVIDIA AI technology, offering a production-ready AI pipeline built on Kubernetes.

PagerDuty Setup: From Beginner to Pro in 10 Steps

This comprehensive guide walks you through the complete PagerDuty setup process, organized into 10 steps. We've structured the guide to match your team's growth journey—starting with essential configurations for small teams, advancing to robust solutions for growing teams, and wrapping up with enterprise-grade features for large organizations. By the end, you'll have a fully operational incident management system set up on PagerDuty tailored to your specific needs.

Observability Reimagined: How AI is Transforming Monitoring

Observability needs to evolve. With AI reshaping IT monitoring, how can businesses leverage predictive analysis, AI-driven monitoring, and auto-remediation workflows to create more resilient infrastructures? At Civo Navigate San Francisco 2025, Jemiah Sius, New Relic, explores how AI is transforming observability, shifting from reactive responses to proactive, intelligent solutions.

How to Prove Your Network Operation Center (NOC)'s Effectiveness

If you’re a telecom provider, you already know that the network operations center (NOC) is integral to service delivery, maintaining uptime, and continuous optimization. These and other vital functions are what empower you to provide seamless service for your customers and stay one step ahead of your competitors. Your team knows all of this already, but how do you demonstrate the effectiveness of your NOC to external stakeholders and leaders?