Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How retailers are improving productivity, transforming incident response, and empowering teams with PagerDuty

For retailers, uptime is money and issues can cost thousands of dollars per minute. With infrastructure comprising complex services such as payment gateways, inventory, and mobile applications, maturing digital operations is vital for ensuring services are always on and customers get the best experience.

Winning on Black Friday - IT Incident Response Made Simple

Even with all the changes in consumer behavior due to COVID-19, Black Friday and Cyber Monday is here to stay. Social distancing measures that limited instore shopping in 2020 has only led more people to shop online, and this trend is expected to continue in 2021. Preparing your e-commerce website and business for the seasonal business surge around Black Friday and Cyber Monday 2021 is crucial.

Why Net at Work employees are sleeping soundly again

Net at Work is a German IT company with over 100 employees that provides its customers with solutions and tools for digital communication and collaboration. Their product NoSpamProxy offers reliable protection against spam and ransomware, legally compliant email encryption and more. Customers of Net at Work are using it as a SaaS solution, and it is being monitored with the agentless network monitoring software PRTG Network Monitor from Paessler AG.

Divisions of Family Practice Adopts OnPage to Enhance Clinical Communication

Effective healthcare communication requires proper software and processes to ensure that the right person receives timely messages. Unfortunately, Divisions of Family Practice (DoFP), a large community-based network of physicians located in British Columbia, Canada, relied on a third-party answering service to connect long-term care facilities (LTCFs) with on-call providers.

What is expected in the SRE role? We analyzed 30 job postings to find out.

In 2016, Google released the definitive book on Site Reliability Engineering (SRE) - a practice that had originated in the company to take care of a monumental problem - how to keep the Google services running with high reliability. Over the years, SRE has been widely adopted by dev teams across the globe and is a popular role at startups and enterprises alike. Here is a look at how search for SRE has trended over the years.

How Do I Add a Major Incident Response to an Existing Integration? - Ask Adam

When we receive an alert, the obvious choice is to accept responsibility for the issue and start resolving it ourselves. But, what happens when the incident is far more major than we thought? With xMatters, you don't have to scramble to find who else is on-call, you can configure the platform to help find other responders for you.

3 Ways to Use the xMatters and Microsoft Azure Monitor Integration

For a number of years, the debate on DevOps vs. ITIL has divided many technology teams. On the surface, both practices seem at odds with one another—DevOps harnesses the power of human collaboration and communication to support innovation, while ITIL utilizes a more systematic and structured approach to deliver service quality and consistency. But, if we take a deeper look, you’ll find that not only can DevOps and ITIL co-exist, they can even complement each other.