Operations | Monitoring | ITSM | DevOps | Cloud

Latest Blogs

What is Vendor Management

Managing vendors efficiently is crucial for any business aiming to maintain smooth operations and achieve its goals. Whether you’re working with suppliers for raw materials, software services, or anything in between, effective Vendor Management ensures that these relationships contribute positively to your organization.In this article, we’ll explore what Vendor Management is, the challenges it presents, the detailed processes involved, and the tools that can help.

How to test AWS managed services with Gremlin

Note In this blog, we use “managed service providers” to refer to companies that provide hosted computing services, not managed IT service providers (MSPs). ‍ When was the last time you thought about the reliability of your cloud dependencies? The biggest challenge with using cloud platforms and SaaS services is also its biggest strength: the provider controls everything.

The Secret To Blazing Fast Docker Builds

It's not an understatement to say Dockerfiles are the underpinning of modern DevOps. Writing a simple Dockerfile that 'works' is relatively straightforward, but there are several tricks and tips that could significantly improve the build speed and efficiency of your container images. If your current Dockerfiles copy in multi gigabyte contexts, reinstall dependencies on every build, or use only a single stage, we need to talk.

Effective Modern Patch Management Processes and Best Practices for Patch Operations

Running a risk-based vulnerability management program is essential to maintaining a secure business computing environment. In a previous blog, “How Implementing Risk-Based Patch Management Prioritizes Active Exploits,” I provided perspective on how to prioritize vulnerabilities. Honing the operational aspect of securing your systems is essential to that process. Conducting patch operations in your organization can be a complicated process.

To the Cloud and Back: When and How to Execute a Cloud Repatriation Effort

The past few years have been dominated by digital transformation characterized by a move away from legacy on-premises systems to the cloud. However, there are also instances when bringing certain assets back from the cloud – a process known as “cloud repatriation” – can be a strategic and cost-effective move. Questions persist about when cloud repatriation makes sense and how organizations should craft their strategy.

Without AI, Your Telemetry Data Pipeline Sucks

History is filled with stories of human triumph. One of the most famous such stories is that of John Henry, “The Steel Driving Man.” As the traditional American folk story goes, John Henry and his fellow workers were faced with the arrival of the steam engine, which threatened to replace their manual labor. To prove that human strength and skill could outperform the new technology, John Henry challenged the machine to a contest.

Steps to AIOps maturity: Improve MTTR with AI

Many organizations face increased costs from excess noise, manual workflows, and long outage times. These inefficiencies negatively impact budget, service uptime, and, ultimately, customer satisfaction. With effective use of AI, you can give operators the most relevant, full-context incident data, providing a greater understanding of an incident within seconds.

Are you Prepared for Your Next Major Outage?

Software is not perfect. And ultimately, it’s not a matter of if you will have an outage, but of when. With the increasing complexity and frequency of IT incidents, is your organization prepared to respond and recover when each second counts? Here at PagerDuty, we’ve compiled a list of best practices to keep your systems up and running.