Operations | Monitoring | ITSM | DevOps | Cloud

Enhanced Flexibility and Security Monitoring - New in DataStream

This update delivers significant advances in operational flexibility and security monitoring capabilities. It addresses the evolving needs of security teams across diverse deployment environments, from air-gapped networks to those prioritizing automation and simplicity, while expanding integration options and improving visibility into data flows.

Why SELinux Matters in Enterprise Security

When evaluating cybersecurity products, it's easy to focus on surface-level features like dashboards, alerts and integrations. But real strength often lies more deeply, in the architecture itself. One embedded capability that demonstrates rigorous security design principles is Security-Enhanced Linux (SELinux). Originally developed by the U.S. National Security Agency (NSA) and released to the open-source community, SELinux is a mandatory access control (MAC) framework built into the Linux kernel.

What Is Business Continuity?

A single outage can stop operations, affect customers, and impact trust. In a world of pandemics, cyberattacks, weather events, and supply chain delays, your team cannot pray that something does not break. Business continuity drives your team to stay ready, recover earlier, and keep downtime lower. In this blog, we’ll explain what business continuity means, how to create a solid business continuity plan, and which approaches help teams keep operational during a disruption event.

What Is Incident Response Lifecycle?

The Incident Response Lifecycle is a step-by-step process that helps engineering teams detect, respond to, and recover from unexpected system disruptions or outages. It includes a series of six practical stages: Detection, Analysis, Impact Mitigation, Incident Resolution, Service Restoration, and Post-Incident Analysis. By following this lifecycle, teams can minimize downtime, reduce business impact, and continuously strengthen system reliability.

Why your Kubernetes clusters and GPUs should live under one roof

The world remains abuzz with AI hype, but the reality is that most modern applications aren’t purely AI workloads. The average company will have web services, APIs, databases, and background jobs running alongside its machine learning inference or training components. An architecture question everyone faces: should your Kubernetes cluster and GPU compute live in the same data center, or can you split them across providers?

How to manage ilert call flows via Terraform

Call flows let you design voice workflows with nodes like “Audio message,” “Support hours,” “Voicemail,” “Route call,” and much more. The ilert Terraform provider now includes a ilert_call_flow resource so you can version and promote these flows across environments. This blog post offers an overview of managing call flows in Terraform, detailing the benefits and key scenarios.

Clarity in the Dojo: The power of the Summary Agent

In the dojo, not every role is about throwing punches. Some roles are about awareness, the unmistakable voice that tells the fighter when to move, where the strike is coming from, and why the opponent matters. That’s the role of the Summary Agent in Sumo Logic Dojo AI. Unlike a traditional agent, it doesn’t launch queries or carry out actions on its own. Its purpose is to narrate, not act. In doing so, it becomes the foundation for every other decision in the dojo.

Demystifying WMI Permissions

Network administrators are always seeking to gain a deeper understanding of their Windows-based environments. Windows Management Instrumentation (WMI) enables their network monitoring tools to access system information, manage configurations and automate tasks. It provides a vital role in network monitoring by providing a standardized interface for querying and controlling system components. A complex set of permissions governs WMI access.

A quick recap of IDPCON 2025

Two weeks ago, we hosted IDPCON 2025, and the response has been overwhelming. Over 250 engineering leaders from 20+ countries joined us for 12 sessions featuring speakers from Canva, Skyscanner, Blackstone, and more. Attendees participated in discussions at 20+ roundtables, sharing strategies and challenges around engineering excellence and internal developer portals.

Unpacking the Elements of Site Uptime (by way of Jeopardy!)

Picture this: you’ve achieved your second lifelong dream of being a contestant on Jeopardy! Now it’s time for the fateful “final answer.” The good news? You’ve got a comfortable lead over your fellow contestants, and a correct response means eternal bragging rights. The bad news? Miss this one, and everyone — your family, coworkers, dentist, mechanic — will remind you of it forever. The lights dim. The audience holds its breath.