Latest News

Navigating External Outages: How Selector Cuts Through the Cloudflare Noise

Nov 19, 2025 By Stephen Ochs In Selector

Yesterday’s widespread Cloudflare outage reminds us how crucial external dependencies are to the stability of our own applications. When a key edge provider like Cloudflare goes down, the impact on your internal monitoring systems can look like a catastrophic, internal system failure triggering a massive storm of alerts and sending engineering teams into frantic, misdirected debugging sessions.

Read Post

Selector

Read more about Navigating External Outages: How Selector Cuts Through the Cloudflare Noise

OnlineOrNot's lessons from Cloudflare's outage on 2025-11-18

Nov 19, 2025 By Max Rozen In OnlineOrNot

On 2025-11-18 at 11:48 UTC, Cloudflare declared an incident affecting the global network (that also affected OnlineOrNot). OnlineOrNot monitors websites, APIs, web apps, and cron jobs, while providing status pages as well. While we partially mitigated the issue by enabling a fallback to AWS-based monitoring, between 13:00 UTC and 14:33 UTC failing checks went unreported, heartbeat checks over-reported, and status pages were unavailable.

Read Post

OnlineOrNot

Read more about OnlineOrNot's lessons from Cloudflare's outage on 2025-11-18

Five ITOps best practices to stay ahead during major third-party outages

Nov 19, 2025 By Adam Blau In BigPanda

When external providers fail—whether it was CrowdStrike outage last year, AWS outage last month, or the Cloudflare DNS outage yesterday—the symptoms inside your environment often look like internal issues: timeouts, login failures, API errors, service degradation, or sudden spikes in dependency-related alerts. It’s natural for teams to start searching through their own infrastructure first, but none of these symptoms clearly point to your systems as the root cause.

Read Post

BigPanda

Read more about Five ITOps best practices to stay ahead during major third-party outages

AI-Suggested Alert Thresholds for Mobile Telemetry

Nov 19, 2025 By Lewis Isaac In Coralogix

Life is pretty good. I’ve shipped a mobile app and I’m (happily) drowning in telemetry. Battery impact, time in foreground/background per screen, crash rates, slow frames, network retries – the works. The data is brilliant; the challenge is turning signals into reliable alerts that catch real issues which are relevant to my app’s functions. So… what should I actually listen for, and where should I set the thresholds?

Read Post

Coralogix

Read more about AI-Suggested Alert Thresholds for Mobile Telemetry

Best Cheap Black Friday VPS Deals - November 2025: A Cost-Based Analysis

Nov 19, 2025 By OpsMatters In OpsMatters

It is November 2025 and Black Friday is here, and the VPS hosting world is getting ready for its biggest sale event of the year. Numerous VPS deals with huge discount percentages will appear across every website, but what should be considered is that the real savings aren't always what they look like at first glance. This guide focuses on total cost and provides a few different options that you can consider.

Read Post

OpsMatters

Read more about Best Cheap Black Friday VPS Deals - November 2025: A Cost-Based Analysis

The Life Cycle of Data, From Creation to Erasure

Nov 19, 2025 By OpsMatters In OpsMatters

Data doesn't just exist - it moves through a predictable, high-stakes life cycle that shapes how securely and efficiently businesses operate. Understanding each phase, from initial creation to final erasure, enables organizations to strengthen governance, mitigate risk and support informed decision-making. Leaders should break down the full life cycle of data to better protect their assets and optimize the flow of information throughout the enterprise.

Read Post

OpsMatters

Read more about The Life Cycle of Data, From Creation to Erasure

Cloud Security Best Practices Every Company Should Follow

Nov 19, 2025 By OpsMatters In OpsMatters

As more businesses move their data, applications, and daily operations to the cloud, securing that environment has become a top priority. Cloud platforms offer flexibility, scalability, and cost savings, but they also introduce shared responsibility-meaning both the provider and the business must play a role in keeping systems safe. Understanding essential cloud security best practices helps organizations reduce risk, protect sensitive information, and maintain compliance in an increasingly digital world.

Read Post

OpsMatters

Read more about Cloud Security Best Practices Every Company Should Follow

AlOps - Laying a Strong Foundation with Full-Stack Observability

Nov 19, 2025 By Puneet Ramaul In OpsMatters

It is fair to say that AIOps is much more than just a catchy tagline; in fact, it is now a fundamental aspect of every enterprise looking to manage a modern, cloud-native architecture along with a distributed system. As AIOps becomes more widely adopted and organizations start expanding, the amount of logs, metrics and traces becomes too much for role-based tracking and monitoring tools. This is the moment in which full-stack observability tools are needed, providing valuable data that observability AIOps engines rely on for their predictive, proactive, and performance issue detection.

Read Post

OpsMatters

Read more about AlOps - Laying a Strong Foundation with Full-Stack Observability

Fast and Clear Presentations for Tech Teams: Saving Time While Explaining Complexity

Nov 19, 2025 By OpsMatters In OpsMatters

We've all been there. Technical complexity is our daily bread, but explaining it? That's where things get messy. Technical presentations shouldn't feel like translating ancient hieroglyphs. Yet here we are, drowning audiences in diagrams that look like subway maps. The irony? We build systems for clarity and efficiency, then create presentations that achieve neither. Modern IT teams face a presentation paradox. We need to move fast-really fast-while ensuring everyone actually understands what we're building, breaking, or fixing.

Read Post

OpsMatters

Read more about Fast and Clear Presentations for Tech Teams: Saving Time While Explaining Complexity

Technical Documentation and Language Skills: How Learning Foreign Languages Improves Understanding of Technical Documentation and SOPs

Nov 19, 2025 By OpsMatters In OpsMatters

Technical documentation is the backbone of safe and efficient work. Yet many professionals approach SOPs and manuals as if they were an unavoidable chore that is easy to misread. What often goes unnoticed is how profoundly foreign-language learning can transform the way technical content is processed. Language study is not simply about words; it reshapes the brain's ability to decode structure and logic. As industries stretch across borders and digital workflows standardize processes worldwide, this linguistic edge becomes almost a secret superpower.

Read Post

OpsMatters

Read more about Technical Documentation and Language Skills: How Learning Foreign Languages Improves Understanding of Technical Documentation and SOPs

Operations | Monitoring | ITSM | DevOps | Cloud

Navigating External Outages: How Selector Cuts Through the Cloudflare Noise

OnlineOrNot's lessons from Cloudflare's outage on 2025-11-18

Five ITOps best practices to stay ahead during major third-party outages

AI-Suggested Alert Thresholds for Mobile Telemetry

Best Cheap Black Friday VPS Deals - November 2025: A Cost-Based Analysis

The Life Cycle of Data, From Creation to Erasure

Cloud Security Best Practices Every Company Should Follow

AlOps - Laying a Strong Foundation with Full-Stack Observability

Fast and Clear Presentations for Tech Teams: Saving Time While Explaining Complexity

Technical Documentation and Language Skills: How Learning Foreign Languages Improves Understanding of Technical Documentation and SOPs

Monthly Archive

Follow Us