Operations | Monitoring | ITSM | DevOps | Cloud

Navigating External Outages: How Selector Cuts Through the Cloudflare Noise

Yesterday’s widespread Cloudflare outage reminds us how crucial external dependencies are to the stability of our own applications. When a key edge provider like Cloudflare goes down, the impact on your internal monitoring systems can look like a catastrophic, internal system failure triggering a massive storm of alerts and sending engineering teams into frantic, misdirected debugging sessions.

OnlineOrNot's lessons from Cloudflare's outage on 2025-11-18

On 2025-11-18 at 11:48 UTC, Cloudflare declared an incident affecting the global network (that also affected OnlineOrNot). OnlineOrNot monitors websites, APIs, web apps, and cron jobs, while providing status pages as well. While we partially mitigated the issue by enabling a fallback to AWS-based monitoring, between 13:00 UTC and 14:33 UTC failing checks went unreported, heartbeat checks over-reported, and status pages were unavailable.

Five ITOps best practices to stay ahead during major third-party outages

When external providers fail—whether it was CrowdStrike outage last year, AWS outage last month, or the Cloudflare DNS outage yesterday—the symptoms inside your environment often look like internal issues: timeouts, login failures, API errors, service degradation, or sudden spikes in dependency-related alerts. It’s natural for teams to start searching through their own infrastructure first, but none of these symptoms clearly point to your systems as the root cause.

AI-Suggested Alert Thresholds for Mobile Telemetry

Life is pretty good. I’ve shipped a mobile app and I’m (happily) drowning in telemetry. Battery impact, time in foreground/background per screen, crash rates, slow frames, network retries – the works. The data is brilliant; the challenge is turning signals into reliable alerts that catch real issues which are relevant to my app’s functions. So… what should I actually listen for, and where should I set the thresholds?

Best Cheap Black Friday VPS Deals - November 2025: A Cost-Based Analysis

It is November 2025 and Black Friday is here, and the VPS hosting world is getting ready for its biggest sale event of the year. Numerous VPS deals with huge discount percentages will appear across every website, but what should be considered is that the real savings aren't always what they look like at first glance. This guide focuses on total cost and provides a few different options that you can consider.

The Life Cycle of Data, From Creation to Erasure

Data doesn't just exist - it moves through a predictable, high-stakes life cycle that shapes how securely and efficiently businesses operate. Understanding each phase, from initial creation to final erasure, enables organizations to strengthen governance, mitigate risk and support informed decision-making. Leaders should break down the full life cycle of data to better protect their assets and optimize the flow of information throughout the enterprise.

Cloud Security Best Practices Every Company Should Follow

As more businesses move their data, applications, and daily operations to the cloud, securing that environment has become a top priority. Cloud platforms offer flexibility, scalability, and cost savings, but they also introduce shared responsibility-meaning both the provider and the business must play a role in keeping systems safe. Understanding essential cloud security best practices helps organizations reduce risk, protect sensitive information, and maintain compliance in an increasingly digital world.

AlOps - Laying a Strong Foundation with Full-Stack Observability

It is fair to say that AIOps is much more than just a catchy tagline; in fact, it is now a fundamental aspect of every enterprise looking to manage a modern, cloud-native architecture along with a distributed system. As AIOps becomes more widely adopted and organizations start expanding, the amount of logs, metrics and traces becomes too much for role-based tracking and monitoring tools. This is the moment in which full-stack observability tools are needed, providing valuable data that observability AIOps engines rely on for their predictive, proactive, and performance issue detection.

Fast and Clear Presentations for Tech Teams: Saving Time While Explaining Complexity

We've all been there. Technical complexity is our daily bread, but explaining it? That's where things get messy. Technical presentations shouldn't feel like translating ancient hieroglyphs. Yet here we are, drowning audiences in diagrams that look like subway maps. The irony? We build systems for clarity and efficiency, then create presentations that achieve neither. Modern IT teams face a presentation paradox. We need to move fast-really fast-while ensuring everyone actually understands what we're building, breaking, or fixing.

Technical Documentation and Language Skills: How Learning Foreign Languages Improves Understanding of Technical Documentation and SOPs

Technical documentation is the backbone of safe and efficient work. Yet many professionals approach SOPs and manuals as if they were an unavoidable chore that is easy to misread. What often goes unnoticed is how profoundly foreign-language learning can transform the way technical content is processed. Language study is not simply about words; it reshapes the brain's ability to decode structure and logic. As industries stretch across borders and digital workflows standardize processes worldwide, this linguistic edge becomes almost a secret superpower.