Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Five worthy reads: The unexpected costs following a cyberattack

Five worthy reads is a regular column on five noteworthy items we’ve discovered while researching trending and timeless topics. In this edition, we’ll learn about the worst data breaches that happened recently, their impact, and the cost of data breaches for companies. The COVID-19 pandemic has not only had an impact on the mental and physical health of employees, but on the digital health of organizations around the world.

Collect Amazon CloudWatch metrics faster with Datadog using CloudWatch Metric Streams

Having quick access to metrics and health signals from your AWS environment is paramount to identifying issues expediently and monitoring the effects of any deployed fixes. Datadog is proud to partner with AWS for the launch of CloudWatch Metric Streams, a new feature that allows AWS users to forward metrics from key AWS services to different endpoints, including Datadog, via Amazon Kinesis Data Firehose with low latency.

ITOM vs ITSM

Is it important to your customer to understand the distinction between IT Operations Management (ITOM) and IT Service Management (ITSM)? Most likely not. Your customers are only concerned with how quickly you solve their problems. It makes no difference to them which application or infrastructure you use. So why should you, as an organization, be concerned about the meaning of these concepts, how they align, how they differ, and why they matter?

Exposing More Public Endpoints for Sending Metrics and Errors to AppSignal

Today, we launch a new feature: sending metrics and errors to AppSignal over our “Public Endpoint” API. AppSignal has many web frameworks, databases and background job frameworks automatically instrumented when you want to monitor a Node.js, Ruby or Elixir app. If you have code running on serverless architecture such as AWS Lambda, you can’t run our agent, so our standard integration with all of the out-of-the-box magic won’t work.

Not knowing real time asset intelligence is a non starter

Complexity breaks correlation. Intelligence brings cohesion. This simple principle is what makes real-time asset intelligence a must-have for AIOps that is meant to diffuse complexity. To further create a context for the user, it is critical to understand service dependencies and correlate alerts across the stack to resolve incidents. CMDB systems have been useful to break down configuration items into logical layers. But, that’s not enough because they can become outdated very soon.

How to configure services in Squadcast: Best practices to reduce MTTR

With a rise in digital platforms, IT infrastructure has grown exponentially complex to a level where multiple application interdependencies coexist with varied architecture & oncall team types. This blog looks at how you can model your infrastructure in Squadcast to reduce your time to respond & resolve incidents.

Comparing Real User Monitoring and Synthetic Transactions

Written by Nick Cavalancia, Microsoft Cloud & Datacenter MVP The need for visibility into service availability and delivery quality has led to the rise in interest in monitoring Microsoft’s Office 365 services from the user perspective. With two different approaches available, what value do they each bring?

How to Improve Core Web Vital Scores

From May 2021, Google is using ‘Core Web Vitals’ as a brand new ranking signal. Google states that business owners should monitor and improve their scores to avoid damaging their organic SEO. In this blog, we will explain how to improve Core Web Vitals scores. To discover the specific issues affecting your users’ experience, we strongly advise having a Core Web Vitals audit.

Feature preview: Trigger agent runs and report collection from Mission Portal

If you are debugging issues with a host, it is quite common to want to make changes to CFEngine policy, and speed up the process of fetching, evaluating and reporting for that host. You can do this by running cf-runagent and cf-hub from the command line, now we’ve brought this functionality into Mission Portal.

Intro to exemplars, which enable Grafana Tempo's distributed tracing at massive scale

Exemplars are a hot topic in observability recently, and for good reason. Similarly to how Prometheus disrupted the cost structure of storing metrics at scale beginning in 2012 and for real in 2015, and how Grafana Loki disrupted the cost structure of storing logs at scale in 2018, exemplars are doing the same to traces. To understand why, let’s look at both the history of observability in the cloud native ecosystem, and what optimizations exemplars enable.