Operations | Monitoring | ITSM | DevOps | Cloud

Hurricane Helene Devastates Network Connectivity in Parts of the South

In this post, we dig into the impacts from Hurricane Helene which came ashore late last month wreaking destruction and severe flooding in the Southeastern United States. Using Kentik’s traffic data as well as Georgia Tech’s IODA, we detail the impacts in three of the hardest-hit states: Georgia, South Carolina, and North Carolina.

Understanding Java Logs

Logs are the notetakers for your Java application. In a meeting, you might take notes so that you can remember important details later. Your Java logs do the same thing for your application. They document important information about the application’s ability to function and problems that keep it from working as intended. Logs give you information to help fix coding errors, but they also give your end users information that helps them monitor performance and security.

Redefining RUM: A Comparative Gap Analysis of Existing Tools

Real user monitoring (RUM) began as a straightforward approach to tracking basic web performance metrics. Focused on things like page load times and response rates, RUM relied on server-side logging and simple browser timings. While these tools captured Core Web Vitals (CWVs), they offered limited insights into how users actually interacted with pages, focused mainly on server-side performance.

Private Cloud Providers: 10 Best Options And Key Features to Consider

While not every organization will opt for a private cloud, those who do must navigate a challenging market with numerous options. But what exactly are private cloud providers? How do they differ from other options, like public or hybrid cloud models? Understanding these distinctions is essential for selecting a provider that meets your organization's specific needs and strategic goals. Let's explore how the private cloud works, the features it provides, and what to look for when choosing a provider.

Guide to incident response metrics and KPIs

IT incident management focuses on quickly identifying and resolving IT issues to restore normal service operations. Tracking key performance indicators (KPIs) of incident response is vital in minimizing service disruptions affecting customers and users. With so much data and many things to track, it’s difficult to identify which metrics and KPIs are right to track. What are the right incident response metrics to use to drive meaningful improvements?

Being Operationally Mature Can Save You Millions

On July 19th, a widespread technical failure crippled operations across industries, resulting in lost revenue, wasted operating costs, and damaged customer trust. For businesses that had built trust by providing reliable and resilient services, this had both an immediate and a lasting impact.

Introducing Enhancements to the PagerDuty Operations Cloud: Building Operational Resilience for the Modern Enterprise

Global outages and disruptions have become an inevitable reality for the modern enterprise. As digital dependencies deepen, organizations must effectively manage disruptions or risk damage to their customer experience, brand reputation, and bottom line. Today, we’re thrilled to unveil the latest innovations for the PagerDuty Operations Cloud.

What Is A Network Drop: Solving Drops in Networks

Network drops can seriously impact business operations, leading to lost productivity, communication breakdowns, and even financial losses. Whether you're managing critical systems, supporting remote teams, or delivering services to customers, a stable network is essential for maintaining business continuity. But what causes these network drops? How can you fix them? And most importantly, how can you prevent them from happening again?

Introducing the Observability Center of Excellence: Taking Your Observability Game to the Next Level

Chasing false alerts — or worse, having your system go down with no alerts or telemetry to give you a heads-up — is the nightmare we all want to avoid. If you’ve experienced this, you’re not alone. Before joining Splunk, I spent 14 years as an observability practitioner and leader for several Fortune 500 companies and in my 2.5 years with Splunk I have had the opportunity to work with customers of all shapes and sizes.

Autoscaling in Cloud Computing

Autoscaling in cloud computing is the ability of a system to adjust its resources in response to changes in demand automatically. This guarantees that applications always have the resources they need to perform optimally, even during periods of high traffic. Autoscaling eliminates manual intervention, allowing your dev team time to focus on your product. All major cloud providers like AWS, Azure, and Google Cloud Platform offer robust autoscaling solutions with many features and capabilities.