Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Streamline communication workflows with the Datadog Slack App

Sharing information about the health and performance of an application is a critical part of any team’s daily workflow. That’s why we’re excited to announce the Datadog Slack App, which simplifies crucial communication tasks by deepening the integration between Datadog and Slack.

How to Construct a Reliability Model for your Organization

As you adopt SRE practices, you’ll find that there are optimization opportunities across every part of your development and operations cycle. SRE breaks down silos and helps learning flow through every stage of the software lifecycle. This forms connections between different teams and roles. Understanding all the new connections formed by SRE practices can be daunting. Building a model of SRE specific to your organization is a good way to keep a clear picture in your head.

How to: Automatically Archive Incident Slack Channels using conditions in FireHydrant Runbooks

FireHydrant’s Slack integration is a great way to speed up your incident response, especially if FireHydrant Runbooks is automatically creating channels in your Slack workspace for each incident. “But what happens after the incident?” First of all, you shouldn’t have to manually archive those Slack channels; especially when you don’t want them clogging up the Slack navigation bar.

How Our Latest Release Makes Your PagerDuty Experience Frictionless

In a world that’s always on, keeping services up and running isn’t just ideal—it’s mission-critical for all of PagerDuty’s customers. It’s not lost on us that serving as the central nervous system for digital operations at some of the world’s largest companies is no small job.

DevOps/SRE Model: Bursting the Developer's Bubble. Here's the CTO Perspective.

Many organizations are transitioning toward a DevOps operational model, where software developers are responsible for operating the applications they develop, instead of a centralized IT operations group. In this “CTO Perspective” interview we talk to BigPanda’s CTO Elik Eizenberg about the challenges in that transition, and what it takes to make it easier. Lean back and watch the interview, or if you prefer reading, take a few minutes to read the transcript.

Alerts out of your database (SQL, Powershell, REST API)

Whether it be on the administrative side of the house or in a production environment, the digital world is not slowing down. In fact, it is increasing by the second. Data is collected from a thousand different sources and often stored in the same number of places. Automating the collection, analyzing and augmentation of this data can be quite a cumbersome task and very time-consuming. Not to mention the loss in revenue when this is not done.

The rise of 'Compliance-ops': Bridging the tech and compliance gap in iGaming

Kimberley Wadsworth gambled £36,000 in a fortnight, committing suicide shortly after the loss and leaving her mother homeless as a result. Kimberley Wadsworth started gambling in 2015, visiting brick-and-mortar shops and playing at online casinos. There was no one to promptly alert or save Kimberly from her dreadful destiny.

How to Reduce MTTR With PagerDuty and Puppet's Relay

DevOps and SRE teams are under intense pressure to reduce the mean time to recovery (MTTR) when resolving incidents. With the proliferation of cloud services and the increasing complexity of DevOps toolchains, engineers today need to not only learn how to use these services, but also troubleshoot them when an incident is raised at 2 a.m. The problem is, many incident response processes are still manual today—cobbling together runbooks and ad hoc scripts and orchestrating people to respond.

Modern IT Systems Have Outgrown Traditional Monitoring

Legacy monitoring tools fall short for SRE teams and DevOps pros tasked with maintaining uptime of key applications in modern, cloud-based IT systems. To have visibility and control over these environments, these teams must collect and analyze more granular, underlying system information — observability data. This article explains why the only way for SRE teams and DevOps pros to extract the necessary insights from this data is through the application of AI capabilities.

Difference between a team lead and an engineering manager and how to transition between these roles

Transitioning from a team lead role to an engineering manager role is tough and you will experience many changes when transitioning between these two roles. What happens when you become an engineering manager?