Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Failure Metrics & KPIs for IT Systems

The game in enterprise IT is this: delivering amazing services to your customers while also reducing costs. That means the time it takes to respond to an incident is critical. Incidents can ruin service delivery and destroy your budget. Certain incidents almost surely deliver a poor customer experience. Response times, you hear? Yep, we’re talking about MTTR, but that’s not all.

Your Self-Managed Journey to Digital Resilience

If you were one of the thousands of Splunk customers who joined us this year at.conf23, you heard our CEO Gary Steele say that Splunk's mission is to help you be digitally resilient. (And don't worry if you couldn't join us, because you can catch the keynote replays.) But what is digital resilience and how do you attain it?

Breaking Through the Threshold: Leveling up ITSI Adaptive Thresholding with Splunk AI

Adaptive thresholding is a key capability in Splunk IT Service Intelligence (ITSI) that enables customers to dynamically monitor the status of their key performance indicators (KPIs) and derive meaningful service insights and alerts.

Operational Intelligence: 6 Steps To Get Started

The ability to make decisions quickly can mean the difference between success and stagnation. Of course, quick decisions aren’t necessarily the right decisions. The right decisions are the best informed, and the best way to get informed is through data. That’s what operational intelligence is all about. In this article, we’re diving into all things operational intelligence (OI), including key benefits, goals and how to get started.

Incident Management Today: Benefits, 6-Step Process & Best Practices

Disruptive cybersecurity incidents become more and more commonplace each day. Even if nothing is directly hacked, these incidents can harm your systems and networks. Navigating cybersecurity incidents is a constant challenge — the best way to stay ahead of the game is with effective incident management.

Dashboard Studio: How to Configure Show/Hide and Token Eval in Dashboard Studio

You may be familiar with manipulating tokens via `eval` or `condition`, or showing and hiding panels via `depends` in Classic (SimpleXML) dashboards, and wondering how to do that in Dashboard Studio. In this blog post, we'll break down how to accomplish these use cases in Dashboard Studio, using the same examples that were shown at.conf23.

Log Data 101: What It Is & Why It Matters

If you mention 'log data' at a crowded business event, you'll quickly be able to tell who's in IT and who isn't. For the average person, log data is about as thrilling as a dental appointment or reconciling a years-old bank account. At the mere mention of log data, their eyes glaze over as they search for an escape from the conversation. Conversely, IT professionals' eyes light up and they become animated when the topic of log data arises.

Distributed Systems Explained

Distributed systems might be complicated…luckily, the concept is easy to understand! A distributed system is simply any environment where multiple computers or devices are working on a variety of tasks and components, all spread across a network. Components within distributed systems split up the work, coordinating efforts to complete a given job more efficiently than if only a single device ran it.

Mean Time to Repair (MTTR): Definition, Tips and Challenges

The availability and reliability of any IT service ultimately govern end-user experience and service performance, both of which have significant business impact. These two concepts — availability and reliability — are particularly relevant in the era of cloud computing, where software drives business operations, but that software is often managed and delivered as a service by third-party vendors.