Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How and When to Inform Website Users of a Data Breach

Data breaches don’t wait for a convenient time to strike. They sometimes take months to uncover. They are complicated beasts, but once you’ve uncovered them some complex rules kick in that determine when you need to report the breach. Reporting a breach can be a daunting prospect. You’ll need to make a public statement in most cases, you may need to report the breach, and there may be legal requirements.

Behind the Grafana UX: Redesigning the Thresholds Editor

As part of building the new Gauge panel in React, we also wanted to update the panel controls, especially the thresholds control. A threshold in the context of Grafana is simply a value that, when exceeded, a condition occurs. An example would be a single stat panel with a green background that changes its background color to red when a threshold is breached.

How to Secure a Kubernetes Cluster

Kubernetes is one of the most advanced orchestration tools that currently exists in the software world. It provides out-of-the-box automation for environment maintenance and simplifies deployment and upgrade processes. It has different implementation types (on-premise, cloud-managed, hybrid, and more), multiple open-source supporting tools, and supports a wide range of configuration options.

Achieve better AWS security with just 10 Cloudtrail logs alerts

CloudTrail logs track actions taken by a user, role, or an AWS service, whether taken through the AWS console or API operations. In contrast to on-premise-infrastructure where something as important as network flow monitoring (Netflow logs) could take weeks or months to get off the ground, AWS has the ability to track flow logs with a few clicks at relatively low cost.

Summit Day Two: New Integrations and Developer Platform to Bring Real-Time Work to More People

Yesterday, we kicked off PagerDuty Summit by launching new features that support the themes of Visibility and Intelligence. If you missed the keynotes or want to know more, check out this blog post. Today, we are making several announcements around two other themes that our CEO Jennifer Tejada touched on during her keynote yesterday: Platform and People. In fact, these themes are so closely related that we refer to them as one—that PagerDuty is a platform for people to do real-time work.

CIO Dive Playbook: AIOps Brings Calm to Overwhelmed IT Ops Teams

Much has been said about how Artificial Intelligence (AI) is already proving its ability to transform business, as well as the way most people live. In fact, according to Accenture’s “ExplAIned: A Guide for Executives,” AI is on par with such life-changing innovations as electricity and the internal combustion engine, and is no longer science fiction.

Slack Loses $8M to Outages

On July 22, 2019, Slack was in the middle of deploying an update to their desktop app. The update was supposed to decrease memory consumption and increase load time, but instead the company suffered a significant, widespread outage on a global scale. After approximately 40 minutes of downtime, the service was back up. But in the meantime, the company whose motto is ‘where work happens’ essentially stopped working.

Why Traditional Kubernetes Monitoring Solutions Fail

Kubernetes has several key differences that push the limits of traditional application monitoring. Due to the distributed ephemeral nature of Kubernetes, most existing solutions fail to give the visibility we might expect, resulting in longer resolution times. Looking at these potential pitfalls can help guide us as we take a fresh look at Kubernetes management and monitoring.