Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How to avoid one of the biggest critical incident headaches (no aspirin required)

You probably know this situation all too (painfully) well . . . A critical incident has hit. You get the alert and the race is on! First headache up on deck – coordinating all the stakeholders. You call, you email, you text – some people are available, some are not. Some might be sleeping, or just getting up in a completely different time zone.

Site reliability engineering- Predictions for 2020

As we head into 2020, it's clear that DevOps has finally crossed the divide and gone mainstream. With DevOps firmly ingrained as a standard practice, we now look at how it will evolve. DevOps is driving more overall alignment between development and operations teams than has ever existed in the past. For developers, that means building and delivering impeccable apps to market quickly.

Lessons in Building Well-Formed Scrum and Kanban Teams

In the early days of Amazon, Jeff Bezos set a rule: teams shouldn’t be larger than what two pizzas can feed, no matter how large a company gets. Setting this rule of small teams meant individuals spent less time providing status updates to each other and more time actually getting stuff done. It also allowed team members more time to focus on continuous improvement. PagerDuty, like Amazon, has a strong culture of continuous improvement.

See Your PagerDuty Account Clearly in 2020

What better way to start off the new year than reflecting on the past 12 months and conducting a retrospective of your systems, processes, and culture at your organization? For instance, what did your overall incident response look like in 2019? Was it a smooth and streamlined process or did chaos reign during incident conference calls? But when burning sage and holding magic crystals don’t refresh your office vibes or your incident response process, PagerDuty University has got you covered.

Docker Commands Cheat Sheet

In this article I will highlight the 6 key docker commands I use on a daily basis while using Docker in the real world. By no means is this an extensive list of commands, I kept it short on purpose so you could use it as a quick reference guide. I’ve also omitted the topic of building images and the commands that are associated with that.

OnPage Celebrates Successful Year

OnPage welcomes the new year with open arms. Though the team is excited for the new decade, we’d like to look back at our organizational growth and success in 2019. The previous year consisted of several Gartner mentions and the release of innovative, new OnPage capabilities. This post discusses and provides detail into these notable accomplishments.

The Role of Live Event Notifications in Your Incident Response Plan

According to a study from the University of Maryland, a hacking attack occurs every 39 seconds. During a quick coffee break, your systems could be attacked up to a dozen times. Depending on how your alerts are set up, you might miss a dozen or more notifications. Missed or delayed alerts, and the resulting slow responses, provide attackers with more time. Every minute provides attackers another opportunity to damage your systems or steal your data.

Five IT Trends to Look Forward to in 2020

New Year’s Eve marks the transition into a new decade, beginning with personal resolutions and expectations for 2020. Much is the same in the IT industry, as support teams expect to adopt trending technologies to reduce their mean time to repair (MTTR) and improve incident resolution. This post will provide an in-depth look into five trends, discussing how growing technologies streamline IT workflows in the new year.

Incident Alert Routing - Getting woken up only by alerts that matter to you

Site reliability engineers have one of, if not the, toughest roles in any organization. While dealing with incidents is one part of the job, the other is to build reliable systems. Google’s SRE book sums this approach nicely. One of the most important challenges for an SRE when it comes to balancing work between firefighting and toil reduction is the issue of alert noise.