Operations | Monitoring | ITSM | DevOps | Cloud

Ask Miss O11y: Long-Running Requests

You need not fear a long-lived streaming workload. A few simple tricks can transform a request that may not ever terminate for hours or days into something you can get regular health and status updates on. We in fact have one of those continuous processing services—Beagle, our Service Level Objective stream processor—which we’ve instrumented in this fashion.

AI-Powered Monitoring Could Have Saved Millions for Global Bank

As most people were preparing to celebrate the new year, the UK’s Santander Bank was dealing with a crisis. On Christmas day, roughly 75,000 people who received payments from companies with accounts at Santander Bank received a duplicate payment transaction. The total damage amounted to £130m, and recovery in these situations is a painful process for both the bank and its customers.

The Business Case for Observability and Site Reliability Engineering

Unlike traditional IT Ops, the role of the SRE isn’t simply focused on finding and solving technical problems. The big win for today’s SREs is supporting the organization’s strategic innovation initiatives. With the appropriate observability capabilities, it’s possible to quantify the value that software infrastructure contributes to this innovation effort.

The Top 5 Use Cases for AIOps Today

By now, you’ve likely heard of AIOps, a technique that promises to inject new levels of efficiency into IT operations with the help of AI and machine learning. But what, exactly, does AIOps mean in practice? Which specific use cases can IT organizations enable or improve with the help of AIOps? Those may be more difficult questions to answer if you have yet to see AIOps at work in your organization.

Kickstart your Splunk App with @Splunk/Create

I’ve been contributing to, and creating, Splunk apps for the better part of the last 10 years. But never have I felt more excited to be a Splunk Developer than right now. One of the primary reasons why I am so excited is because of build tools like @splunk/create. At Splunk, we recognize that developers are so crucial to our entire ecosystem.

Monitoring AWS Spot instances using Sumo Logic

Spot worker nodes on EKS (Elastic Kubernetes Service) are a great way to save costs by allowing customers to take advantage of unused capacity. With Sumo Logic, we have experimented with and adopted spot worker nodes for some of our EKS clusters to see if we can pass along the same benefits. We decided to share some of the learnings, challenges, and caveats with using spot instances along with the monitoring setup.

6 Tips on How to Run Your CAB Like A Boss

The change advisory board (CAB) can be one of the most important and useful meetings a service-oriented IT organization holds. It sets out a view of what’s happening to key services over the next week (or longer depending on its frequency and timeframe), confirms the impact to the business, reviews previous change activity, and looks at continual service improvement (CSI).

Planning a Pain-Free Cloud Migration. Interview with David Colebatch, CEO / Chief Migration Hacker

Ever wonder how best to plan for a successful cloud migration? I sat down with David Colebatch, CEO / Chief Migration Hacker at Tidal Migrations to better understand what it takes to approach a cloud migration with confidence. David will help to demystify cloud migration planning and further change our perceptions of the cloud.

How to use OpManager as an effective disk space monitor for your network monitoring environment

Disk space availability in servers is crucial. Applications that run on these servers save log files and write data to a database that is also installed on the server; if there isn’t enough disc space, the application may not work properly and may crash. Monitoring disc space is critical for IT administrators to maintain server performance and network availability by preventing a sudden and unexpected lack of server disc space.