Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Performing Postmortems & Postmortem Templates at Squadcast | SRE Best practices | Squadcast

Postmortems are a way to summarize the resolution for an incident once it is resolved. It is also a way for you to create a knowledge-base of failures and fixes that can be shared across your team to help build a culture of shared learning and learning from failures.

Open-source storage for beginners with Ceph

Modern organisations have become reliant on their IT capabilities, and at the heart of that infrastructure is a growing need to store data. Be it transactional databases, file shares, or burgeoning data lakes for business analytics. Traditionally, storage needs have been catered to by big iron hardware vendors, but over the last decade, more and more organisations have turned to open-source solutions such as Ceph running on commodity hardware.

Using StatusPage at squadcast | SRE Best practices | Squadcast

Let your customers know how your Services are doing, without them having to ask you about it. One of the core principles of SRE is Transparency and Status Pages help you communicate the status of your Services to your customers at all times, as opposed to you getting to know the status of your Services through support tickets logged by your customers.

Monitor your Edgecast CDN with Datadog

Edgecast is a global network platform that provides a content delivery network (CDN) and other solutions for edge computing, application security, and over-the-top video streaming. Using Edgecast’s JavaScript-based CDN, teams can improve web performance by caching static and dynamic content with low latency and minimal overhead.

Feeling zen, finding DORA, and the policy police

We’ve had a bumper month here at incident.io HQ. We’ve welcomed 3 new joiners, celebrated two 1 year incident.io anniversaries (congrats Lisa and Lawrence!), released a whole load of exciting new features and (for those of you wondering what’s been causing the recent heatwave) we’ve redesigned our website and it is on fire 🔥 😎 Here’s a round-up of some of this month's highlights…

Updating our data stack

It’s been over 6 months since Lawrence’s excellent blog post on our data stack here at incident.io, and we thought it was about time for an update. This post runs through the tweaks we’ve made to our setup over the past 2 months and challenges we’ve found as we’ve scaled from a company of 10 people to 30, now with a 2 person data team (soon to be 3 - we’re hiring)!

20+ SysAdmin Tools You Can't Live Without

Being a system administrator is a high-level and demanding profession. Yes, we’re talking long hours (not counting overtime!), unforeseen events requiring attention, and so much troubleshooting. But not everything about SysAdmin life has to be more challenging than it needs to be. That’s why we put together this list of must-have SysAdmin tools so you can optimize your workflow and focus on critical tasks.

How to Explain Zero Trust to Your Tech Leadership: Gartner Report

Does it seem like everyone’s talking about Zero Trust? Maybe you know everything there is to know about Zero Trust, especially Zero Trust for container security. But if your Zero Trust initiatives are being met with brick walls or blank stares, maybe you need some help from Gartner®. And they’ve got just the thing to help you explain the value of Zero Trust to your leadership; It’s called Quick Answer: How to Explain Zero Trust to Technology Executives.