Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Ubuntu Explained: How to ensure security and stability in cloud instances-part 2

You probably know that it is important to apply security updates. You may not be as clear on the details of how to do that. We are going to explain best practices for applying Ubuntu updates to single instances and what the built-in unattended-upgrades tool does and does not do.

Cloud backup: improve your disaster recovery plans

Today the lowest cost media per terabyte for backups is still tape, even after factoring in the handling costs of manually loading and unloading tape libraries, and logistics surrounding off-site storage. However, while inexpensive, tapes are inflexible. And when used as an offline solution it can take many hours to retrieve them from offsite storage – not to mention the additional time required to load them into a tape library before a recovery can even start.

Kubecon 2023: Code, Culture, Community, and Kubernetes

Kubecon 2023 was more than just another conference to check off my list. It marked my first chance to work in the booth with my incredible Kentik colleagues. It let me dive deep into the code, community, and culture of Kubernetes. It was a moment when members of an underrepresented group met face-to-face and experienced an event previously not an option.

The only way to measure developer productivity without causing a revolt

In an article titled The Worst Programmer I Know, Dan North, the creator of behavior-driven development, writes about a nearly fired developer he saved from the unemployment line. This developer consistently delivered zero story points, even though delivered story points was the primary metric for developer productivity at their (unnamed) software consultancy.

A Closer Look at AlertBot's Email Reports

Here at AlertBot, we know that our customers don’t want to get bogged down with mountains of raw information about their websites and related processes. Instead, they want clear, organized, and reliable intelligence that tells them: what happened recently, what’s happening now, what’s likely to happen in the near future — and what they can do about it. That’s where email reports enter the story.

Validate JSON files against schema in Azure DevOps build

JSON files have become part of our daily lives. We use JSON files for all sorts of tasks like settings, defining database schemas, and much more. The other day I found out that invalid JSON files had been pushed to one of our repositories. So, I decided to include JSON file validation as part of our build on Azure DevOps. In this post, I'll share the solution. I'm sure you can think of a scenario where invalid JSON files either do not parse as valid syntax or don't conform to the intended format.

Should data teams consider incident management tools to respond to pipeline issues?

Data teams are adopting more processes and tools that align with software engineering, and from talks at the dbt Coalesce conference in 2023, there’s clearly a big push towards adopting software engineering practices at enterprise scale companies. At the moment, there are a lot of tools in the data space for identifying errors in data pipelines, but no tools for responding to these errors, such as coordinating fixes. This is exactly where an incident management platform makes sense to implement.

Scaling Engineering Teams

The software engineering world has become a place where compute, storage, and availability have become the cornerstones of scale. As an industry and as individuals, we should stop to take a closer look at scaling the most important of all resources… our people. In this post I’ve modeled a team with 6 engineers, 2 Sr, 3 Mid, and 1 Jr. This team is getting 450 “units” of work done ( where a unit is just some measure of throughput ) per interval (2 months).

What is the Role of AIOps in Modern Network Management?

In IT, the introduction of Artificial Intelligence for IT Operations (AIOps) has been nothing short of revolutionary. As networks become increasingly complex and data-driven, traditional network management methods are proving inadequate. AIOps has emerged as a critical tool in the arsenal of network managers, offering innovative solutions to manage and optimize networks in real-time.

How SpyCloud Architected Its Cribl Stream Deployment

In this livestream, I talked to Ryan Saunders – Manager of Security Operations at SpyCloud, about how he used the Cribl Reference Architecture to build a scalable deployment. He explained how this approach enabled SpyCloud to grow alongside its evolving needs without requiring significant rework. The reference architecture also facilitated a repeatable data-onboarding process, reducing administrative time and allowing the team to focus on critical security and data analysis tasks.