Operations | Monitoring | ITSM | DevOps | Cloud

%term

Are you testing for known reliability vulnerabilities?

Are you testing for known reliability vulnerabilities? "Risks have different priorities, but ultimately we want to be aware of those risks. Just like we want our security team to go scan for known vulnerabilities, our reliability team should be scanning for known vulnerabilities. And those are easy things we should go address. There's a second part of it, which is kind of just good engineering testing, which is: Hey, we have a set of test cases that we know need to pass.

How to control your overage bills

We all know how tricky it can be to keep track of costs, especially when your projects spike or with the latest feature that your users love. That's why we've been working on a solution to ensure you never have surprise billing due to on-demand occurrences. Introducing our latest feature to give you both flexibility and control: Overage Budgets.

Are you Prepared for Your Next Major Outage?

Software is not perfect. And ultimately, it’s not a matter of if you will have an outage, but of when. With the increasing complexity and frequency of IT incidents, is your organization prepared to respond and recover when each second counts? Here at PagerDuty, we’ve compiled a list of best practices to keep your systems up and running.

Steps to AIOps maturity: Improve MTTR with AI

Many organizations face increased costs from excess noise, manual workflows, and long outage times. These inefficiencies negatively impact budget, service uptime, and, ultimately, customer satisfaction. With effective use of AI, you can give operators the most relevant, full-context incident data, providing a greater understanding of an incident within seconds.

Without AI, Your Telemetry Data Pipeline Sucks

History is filled with stories of human triumph. One of the most famous such stories is that of John Henry, “The Steel Driving Man.” As the traditional American folk story goes, John Henry and his fellow workers were faced with the arrival of the steam engine, which threatened to replace their manual labor. To prove that human strength and skill could outperform the new technology, John Henry challenged the machine to a contest.

To the Cloud and Back: When and How to Execute a Cloud Repatriation Effort

The past few years have been dominated by digital transformation characterized by a move away from legacy on-premises systems to the cloud. However, there are also instances when bringing certain assets back from the cloud – a process known as “cloud repatriation” – can be a strategic and cost-effective move. Questions persist about when cloud repatriation makes sense and how organizations should craft their strategy.

Effective Modern Patch Management Processes and Best Practices for Patch Operations

Running a risk-based vulnerability management program is essential to maintaining a secure business computing environment. In a previous blog, “How Implementing Risk-Based Patch Management Prioritizes Active Exploits,” I provided perspective on how to prioritize vulnerabilities. Honing the operational aspect of securing your systems is essential to that process. Conducting patch operations in your organization can be a complicated process.

The Secret To Blazing Fast Docker Builds

It's not an understatement to say Dockerfiles are the underpinning of modern DevOps. Writing a simple Dockerfile that 'works' is relatively straightforward, but there are several tricks and tips that could significantly improve the build speed and efficiency of your container images. If your current Dockerfiles copy in multi gigabyte contexts, reinstall dependencies on every build, or use only a single stage, we need to talk.