Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

What are the Best Practices to Improve the Incident Management Process?

DevOps and IT Operation teams employ the incident management process to respond to an unanticipated event or service outage and return the service to operational status. In the ITIL framework, it is a mechanism that links end-users and the IT department for more effective incident response. A robust incident management system in any company will allow the employee to raise a ticket detailing the issue he/she is facing.

Routing alerts from AWS Elastic Beanstalk via CloudWatch

Amazon Web Services (AWS) offers 100+ services, each focusing on a specific area of functionality. However, it can be challenging to pick the right services for the task and also to provision them. AWS Elastic Beanstalk, lets you easily deploy and manage applications without the need to learn about the underlying infrastructure that runs these applications.

What's New: Updates to Incident Response, PagerDuty Process Automation Software & PagerDuty Runbook Automation, Integrations, and More!

We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, as well as Community & Advocacy Events updates. We continue to help customers further automate to optimize cloud operations and reduce the amount of issues escalated to other teams.

Fast track video series: Extracting alert data from emails

With BigPanda’s self-service Email Parser, extracting alert data from emails has never been more simple. In our latest video in the Fast track series, we explore the benefits of this tool. This parser is ideal for monitoring tools and systems that do not support REST API and or rely solely on email to generate and send alerts. So no matter what tools your organization utilizes, this feature can help you turn all of those alert emails into actionable incidents within BigPanda’s platform.

PagerDuty and DataOps: Enabling Organizations to Improve Decision Making with Better Data

Many organizations have been digitally transforming their operations and the majority of them are moving to the cloud. With this transformation, data teams have to analyze ever larger and more complex data sets to allow downstream teams to make faster and more accurate decisions on a daily basis. Consequently, most organizations need to work with: customer data, product data, usage data, advertising data, and financial data.

Do You Understand Your Essential Business Processes?

Before you can choose the proper tools for your organization, you have to understand its essential business processes. Once you know an essential business process, you can review software applications that will help make your organization more efficient and accurate. Unfortunately, many organizations do not understand their essential business processes. This makes it nearly impossible for them to streamline their organizations, which puts them at a disadvantage in the marketplace.

A deep-dive into event correlation

Event correlation is a powerful capability that can help reduce IT noise, detect incidents in real-time, and improve the performance of critical applications and services. Read on for a deep dive into event correlation as we explore everything from its origins to its current state-of-the-art techniques. We’ll also discuss how event correlation fits into the bigger picture of integrated service management.

The Roblox Outage

Just before Halloween 2021, Roblox engineers experienced a horror story: a service outage that also took down critical monitoring systems. It seemed like the issue was a hardware problem, but it wasn’t. Users were frustrated, and the clock was ticking. After three full days of downtime, service was finally restored on Halloween day. While the incident itself was an IT nightmare, Roblox’s detailed technical post-mortem several months later was an excellent way to bounce back.