Operations | Monitoring | ITSM | DevOps | Cloud

Investigating the Database Family Tree

Investigating your family tree can be an interesting experience. For example, what if you discovered you were related to a famous person who won a Nobel Prize or performed a heroic act? Conversely, what if you realized you had an ancestor who was an infamous criminal? Much like examining your genealogy can be an exciting adventure, looking at the family tree of your database can prove to be just as rewarding. Databases occasionally undergo a phenomenon known as drift.

Incident Review - Akamai Performance Degradation Slows Down Major Websites Worldwide

This summer has seen a series of outages and performance degradations from some of the world’s most widely used CDNs, including the June 8, 2021 Fastly outage (owing to DNS or configuration issues) and an Akamai outage on July 22, 2021 (also likely caused by DNS failure).

An Introduction to Distributed Tracing

There’s no strict definition of a distributed system. But generally speaking, if you have reached a point where you’re running more than five interdependent services at once, that means you’re running a distributed system. It also means you are more than likely experiencing difficulties when troubleshooting using traditional debugging tools. Unfortunately, pulling up multiple tools, each built for a monolithic world, doesn’t help pinpoint the problem.

Achieving the Army's data imperatives at the tactical edge with Elastic

As the Industrial Age Army transforms to the Information Age Army, Army leadership recognizes the need for adaptable technologies that enable data exchange at the tactical edge. Not only must these technologies be in lock step with the 8 guiding principles of the DoD Data Strategy, but they must also deliver on the Army’s data imperatives of speed, scale and resilience.

Essential Tools for Site Reliability Engineers

Site reliability engineers (SREs) are involved in scaling systems and making them reliable and efficient for organizations. But SREs often fail to build system resiliency when they do not have the right tools at their disposal. In this post, we’ll uncover five leading tools that SREs can use to drive the reliability and stability of computing systems. It also examines how SREs can use the tools to improve operations tasks and infrastructure processes.

Smarter CPU Testing - How to Benchmark Kaby Lake & Haswell Memory Latency

Modern CPUs are complex beasts with billions of transistors. This complexity in hardware brings indeterminacy even in simple software algorithms. Let’s benchmark a simple list traversal. Does the average node access latency correspond to say, a CPU cache latency? Let’s test it! Here we benchmark access latency for lists with a different number of nodes. All the lists are contiguous in memory, traversed sequentially, and have a 4 KB padding between the next pointers.

Where configuration management falls short: model-driven OpenStack

Have you ever installed OpenStack from scratch? I know, it sounds geeky, unnecessary and maybe even overcomplicated … It is after all 2021, OpenStack is mature, there are hundreds of OpenStack distributions available out there, configuration management tools are all the way around and installing OpenStack from scratch almost sounds like compiling the Linux kernel or using make scripts to install software on Ubuntu.

How Kubernetes 1.22 addresses industry needs

On August 4th 2021, Kubernetes (K8s) upstream announced the general availability of Kubernetes 1.22, the latest version of the most popular container orchestration platform. At Canonical, we actively track upstream releases to ensure our Kubernetes distributions align with the latest innovations that developers and businesses need for their cloud native use cases.