Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Our IT transformationstrategyequals three zeros

When I joined ServiceNow, IT Operations had what I’d call a “best effort” strategy—push hard, keep things running, meet your SLAs, and so on. This worked well enough, but that was hardly a formula for long-term success at scale.   We needed a new way of thinking that would move us from our traditional role as incident firefighter to that

Solving Runaway Series Cardinality When Using InfluxDB

In this post, you’ll learn what causes high series cardinality in a time series database and how to locate and eliminate the culprits. First, for those of you just encountering this concept, let’s define it: The number of unique database, measurement, tag set, and field key combinations in an InfluxDB instance. Because high series cardinality is a primary driver of high memory usage for many database workloads, it is important to understand what causes it and how to resolve it.

Working in the SOC with Power Tools: Splunk and Polarity

Have you ever had to saw through a board by hand? I had to finish a partial cut by hand the other day while building a new mantle for my fireplace. It’s slow and difficult, and it often results in a lesser quality cut than one done with a power tool. It’s good exercise, though! We should all have to do it at least once so we appreciate our power tools more.

Sponsored Post

Reduce MTTR with Crowd-Sourced Analytics

The new normal for enterprises today is to witness the vast majority of its employees working remotely across multiple geographic locations and communicating through cloud applications such as Office 365, Slack, or video conferencing tools such as Microsoft Teams or Zoom. As more users feel the need to avoid travel and stay at home due to Covid19, it becomes critical that the underlying infrastructure monitoring these applications respond immediately to service disruptions and sub-optimal performance. The slower an application becomes, the more negative impact it can have on employee productivity and the firm's ability to conduct business smoothly.

Understanding how attackers move inside your organization

Cyberthreats have been coming at us from the left, right, and center. The number of cyberattacks is forever on the rise, and companies need to keep ramping up their security measures to protect themselves. It’s important that these measures cover every aspect of a network environment. To understand why monitoring your whole environment is so important, let’s take a look at what an attacker might do once they get inside your organization.

Monitor Auth0 with Datadog

Auth0 provides identity as a service (IDaaS), allowing you to secure your apps and APIs without having to write your own authorization code. Auth0 can work with social identity providers (IdP) like Google and Facebook so your users can access your app by using their existing accounts for authentication. You can also use an existing enterprise identity provider (e.g., LDAP) to allow your users to leverage single sign-on (SSO) across multiple apps.

NiCE Active 365 Management Pack 3.20 released

Microsoft 365 outages happen. The NiCE Active 365 Management Pack for SCOM is an intelligent monitoring set for Microsoft Teams, SharePoint, OneDrive, and Office 365 hosted on-premise. The NiCE Active 365 Management Pack ensures end-to-end control for your Microsoft 365 services. IT executives, operators, and administrators can now actively boost Microsoft 365 User Experience by starting advanced monitoring today.

Now in Beta: SLA Monitor by StatusGator

StatusGator has been monitoring hundreds of status pages for more than 5 years. During this time, we’ve collected millions of data points about the status of the cloud: What went down, how long it was down, messages about why, and more. As StatusGator grows, we’re working on ways to incorporate this archival data into StatusGator for various uses. One common use: holding vendors accountable to Service Level Agreements. Introducing: SLA Monitor by StatusGator.

Exploring Node.js Async Hooks

Have you ever heard of Node.js async hooks module? If the answer is no, then you should get familiar with it. Even though it’s new stuff (released along with Node.js 9) and the module is still in experimental mode, which means it’s not recommended for production, you should still get to know it a bit better. In short, Node.js async hooks, more specifically the async_hooks module, provides a clear and easy-to-use API to track async resources in Node.js.

The Ultimate, Free Incident Retrospective Template

Incident retrospectives (or postmortems, post-incident reports, RCAs, etc.) are the most important part of an incident. This is where you take the gift of that experience and turn it into knowledge. This knowledge then feeds back into the product, improving reliability and ensuring that no incident is a wasted learning opportunity. Every incident is an unplanned investment and teams should strive to make the most of it.