Operations | Monitoring | ITSM | DevOps | Cloud

The Biggest Website Outages of All Time

As much as we all love the internet and everything it offers, we’ve also all experienced that sinking feeling when we try to access our favorite website, only to find it’s down. If you run your own site, you know that uptime is crucial for your online success — so that sinking feeling in your chest when your own website is down is … well, even worse. But let’s face it: even the internet giants aren’t immune to outages.

Announcing incident.io Status Pages - powering clear external comms to build trust

Clear and frequent communication carries considerable weight in today's era of hyper-competition among businesses—especially during incidents. Because of this, status pages have become the go-to choice for companies looking to prioritize trust, transparency, and clarity with their customers, even during downtime. Unfortunately, current status page solutions have made these communications particularly frustrating and stressful.

Grafana Cloud is now available in AWS Marketplace

Grafana Labs is excited to announce that Grafana Cloud is now available in AWS Marketplace. With this new offering, existing AWS customers can procure, deploy, and scale the fully managed Grafana LGTM observability stack (Loki for logs, Grafana for visualization, Tempo for traces, Mimir for Prometheus metrics) with just a few clicks.

DevOps Pulse 2023: Increased MTTR and Cloud Complexity

Evolving DevOps maturity, mounting Mean-Time-to-Recovery (MTTR), and perplexing cloud environments – all these factors are shaping modern observability practices according to approximately 500 observability practitioners. While every organization faces its unique challenges, there are broadly impactful trends that arise.

Increasing Implications: Adding Security Analysis to Kubernetes 360 Platform

A quick look at headlines emanating from this year’s sold out KubeCon + CloudNativeCon Europe underlines the fact that Kubernetes security has risen to the fore among practitioners and vendors alike. As is typically the case with our favorite technologies, we’ve reached that point where people are determined to ensure security measures aren’t “tacked on after the fact” as related to the wildly-popular container orchestration system.

Elastic Common Schema and OpenTelemetry - A path to better observability and security with no vendor lock-in

At KubeCon Europe, it was announced that Elastic Common Schema (ECS) has been accepted by OpenTelemetry (OTel) as a contribution to the project. The goal is to achieve convergence of ECS and OpenTelemetry’s Semantic Conventions (SemConv) into a single open schema that is maintained by OpenTelemetry. This FAQ details Elastic’s contribution of Elastic Common Schema to OpenTelemetry, how it will help drive the industry to a common schema, and its impact on observability and security.

Lightstep from ServiceNow deepens commitment to OpenTelemetry project

At Lightstep, we’ve seen many organizations grapple with “cloud-native sticker shock” as they come to understand that these complex systems require sifting through massive amounts of data across architectures and proprietary solutions. In today’s macroeconomic environment, organizations are looking to reduce costs while driving innovation, especially when it comes to cloud-native applications.

IT Incidents vs. Alerts

IT incidents are events which lead to a disruption or deviation from the regular operating standards of a computer system or network. They can be caused by various factors, including hardware or software failures, human error, or even deliberate external (cybersecurity) attacks. It begins with short delays, or services cutting out - for example, when a website or server is down, or access to data(bases) takes too long.

The Three Pillars of Observability: Metrics, Logs and Traces

Metrics, Logs and Traces are often referred to as The Three Pillars of “Observability“. The term observability has been used in control theory to refer to how the state of a system can be inferred from the system’s external outputs. Applied to IT, observability is how the current state of an application can be assessed based on the data it generates. Applications and the IT components they use provide outputs in the form of metrics, events, logs and traces (MELT).