Operations | Monitoring | ITSM | DevOps | Cloud

3 common pitfalls of post-mortems

Small confession: we currently use the term 'post-mortem' in incident.io despite preferring the term 'incident debrief'. Unless you have particularly serious incidents, the link to death here really isn’t helping anyone. However, we're optimising for familiarity, so we're sticking to the term 'post-mortem' here. Ask any engineer and they’ll tell you that a post-mortem is a positive thing (despite the scary name).

Quick Start: Telegraf's Starlark Processor Plugin

After a mortgage payment, energy costs are typically the largest household expense. In my case it was an easy decision to install solar panels, but I wanted to perform in-depth analyses with historical data. Deploying monitoring sensors was straightforward; collecting and processing the raw data became the main challenge. Telegraf and InfluxDB are ideal choices for managing time series data. Although I had no prior experience, a Docker instance of Telegraf was onboarded in no time.

SOC 2: Data Security For Cloud-Based Observability

As more companies adopt SaaS services over on-premise delivery models, there is a natural concern around data security and platform availability. Words on a vendor’s website can provide insights to prospective customers on the process and policies that companies have in place to alleviate these concerns. However, the old adage of “actions speak louder than words” does apply. Trust in a website’s words only goes so far.

Hybrid Network Triage for the New Enterprise Network

We all know that cloud and SaaS adoption continues to grow rapidly, often outpacing budgets. In fact, spending on IaaS and SaaS exceeded budgets in more than 40% of organizations in 2021. As a result, network traffic is now spending much more time on the internet than in our own data centers. The internet has become the new enterprise network.

Zero Trust Security: Key Concepts and 7 Critical Best Practices

Zero trust is a security model to help secure IT systems and environments. The core principle of this model is to never trust and always verify. It means never trusting devices by default, even those connected to a managed network or previously verified devices. Modern enterprise environments include networks consisting of numerous interconnected segments, services, and infrastructure, with connections to and from remote cloud environments, mobile devices, and Internet of Things (IoT) devices.

solr-reindexer: Quick Way to Reindex to a New Collection

If you’re using Solr, for sure there are times when you change the schema and need to reindex. Quite often the source of truth is a database, so you can use streaming expressions via the JDBC source to reindex. But sometimes that’s not possible or adds too much load to the DB. So how can we use Solr itself as a source?

Moogsoft Green Credentials

Waste is never a good thing. And rumblings of an economic downturn, alongside dire warnings of climate change, are making it increasingly necessary to address waste. As a society, we need to reduce consumption, data included. First, we all must acknowledge the high cost of data. Despite the prevailing opinion of the 2010s, data isn’t free. There’s a monetary and carbon cost to keeping data alive.

"Why Are My Tests So Slow?" A List of Likely Suspects, Anti-Patterns, and Unresolved Personal Trauma

“Lead time to deploy” means the interval from when the code gets written to when it’s been deployed to production. It has also been described as “how long it takes you to run CI/CD.” How important is it? It’s nigh-on impossible to have a high-performing team if you have a long lead time, and shortening your lead time makes your team perform better, both directly and indirectly.

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data is going to be more than twice the amount of data created since the advent of digital storage. With the success of your company often determined by how you anticipate and respond to threats – and leverage meaningful insights – you need the ability to quickly search and find insights in your data, despite this increasing deluge of information.

Automating Common Diagnostics for Kubernetes, Linux, and other Common Components

This is the second piece in a series about automated diagnostics, a common use case for the PagerDuty Process Automation portfolio. In the last piece, we talked about the basics around automated diagnostics and how teams can use the solution to reduce escalations to specialists and empower responders to take action faster. In this blog, we’re going to talk about some basic diagnostics examples for components that are most relevant to our users.