Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Automate workflows with Datadog's Amazon EventBridge integration

Amazon EventBridge is a serverless event bus that routes real-time data streams from your applications and services to targets like AWS Lambda. EventBridge facilitates event-driven application development by simplifying the process of ingesting and delivering events across your application architecture, and by providing built-in security and error handling. We are excited to announce that you can now use our new integration to route Datadog alerts to EventBridge with minimal configuration or setup.

PerfOps FlexBalancer Raw Logs

Our latest product release contains plenty of new features, so we decided to list the major ones that our users will find most exciting. We now make it even simpler to debug your load-balancing by providing access to raw logs for FlexBalancer. All requests hitting our backend are now logged and available for analysis by our customers directly on the Analytics page. When you setup complex load-balancing rules it could be hard to understand how it will perform in some rare use-cases.

Cross-tenant monitoring with Azure Lighthouse and Datadog

Azure Lighthouse is a new feature that provides improved access management for users and applications across different Azure tenants. With Azure Lighthouse, managed service providers (MSPs) can manage their customers’ environments more easily and efficiently than ever before. Datadog is proud to announce support for Azure Lighthouse, which ensures that MSPs can implement a streamlined, scalable approach to monitoring their customers’ Azure environments.

A Guide to Open Source Monitoring Tools

Open source is one of the key drivers of DevOps. The need for flexibility, speed, and cost-efficiency, is pushing organizations to embrace an open source-first approach when designing and implementing the DevOps lifecycle. Monitoring — the process of gathering telemetry data on the operation of an IT environment to gauge performance and troubleshoot issues — is a perfect example of how open source acts as both a driver and enabler of DevOps methodologies.

What do these error codes mean?

The other day whilst using a very popular website I came across a series of 404 unavailable page messages. I didn’t think much about it at the time but on reflection it made me wonder how many people actually understand what different error codes mean? Hands up, I only know a few and I work in the website monitoring sector. To most, it just means a weird IT message when things go wrong.

Keep stakeholders in the know with Incident Timeline from Opsgenie

Technology is changing the world faster than ever. Thanks in part to the rise of the Software-as-a-Service (SaaS) model, customers have come to expect the apps they use to be accessible at all times. As a result, companies are transforming the way their teams operate in order to meet these demands. And perhaps no team experiences the impact of a transformation like this more than IT.

Prometheus v2.11 Released

Since graduating within CNCF last August, Prometheus has adopted a new schedule for releases every six weeks. The latest release, v2.11, arrived on July 9. Prometheus 2.11 includes a new option to compress WAL records using Snappy, query performance improvements, the option to use Alertmanager API v2, and more. You can download the latest version here. prometheus_tsdb_wal_reader_corruption_errors is now renamed to prometheus_tsdb_wal_reader_corruption_errors_total.

An Introduction to Python List Comprehensions

Python list comprehensions offer a concise method of interacting with each element of a list. Even though they’ve been available since Python 2.0, their syntax often demotivates people from using them. This article aims to introduce List Comprehensions in a friendly way and offer you one more Python feature to add to your scripting toolbox.

I Came, I Saw, I Monitored: Troubleshoot Unified Communications Like a Roman Emperor

“We were born to work together like feet, hands, and eyes, like the two rows of teeth, upper and lower … like Cisco HCS, Nortel, or Skype for Business and our distributed development teams.” Marcus Aurelius, Roman Emperor, Unified Comms Futurist* * (not really) OK, so, the famed Roman emperor may not have mentioned technology in his A.D.

Mark Henderson from Stack Overflow shares his experience on being an SRE

Mark Henderson has been a Site Reliability Engineer at Stack Overflow since 2015. Before this he worked as the sole systems administrator at a small software company in Sydney, Australia. These days, he lives in South Australia and works from home with his wife and two children.