Operations | Monitoring | ITSM | DevOps | Cloud

Hot topics from KubeCon North America 2023

As another year at KubeCon + CloudNativeCon (CNC) draws to a close, the latest installment in Chicago might be one of our favorite ones so far! With talks having an ever greater focus on the impact of sustainability and inclusion within the community, we loved getting involved in the conversation about how we can make the cloud a better solution for all.

Digging into the Optus Outage

Last week a major internet outage took out one of Australia’s biggest telecoms. In a statement out yesterday, Optus blames the hours-long outage, which left millions of Aussies without telephone and internet, on a route leak from a sibling company. In this post, we discuss the outage and how it compares to the historic outage suffered by Canadian telecom Rogers in July 2022.

From Oops to Ops: SLOs Get Budget Rate Alerts

As someone living the Honeycomb ops life for a while, SLOs have been the bread and butter of our most critical and useful alerting. However, they had severe, long-standing limitations. In this post, I will describe these limitations, and how our brand new feature, budget rate alerts, addresses them. We usually don’t have SREs writing product announcements, but I’m so excited about this one that I said, “Screw it, I’m doing it!”

Kubecon North America 2023 event recap

As autumn graced the vibrant city of Chicago, I had the distinct opportunity to immerse myself in the heart of innovation and camaraderie at the CNCF’s Kubecon North America conference. Over the span of four remarkable days, from Nov 6-9, I was fortunate enough to walk alongside the many enthusiasts, contributors and organizers of open source and cloud native communities.

How To Investigate a Reported Problem

Getting to the root cause of a problem in cloud-native environments requires engineers to navigate through immense complexity within a distributed system. Oftentimes, you didn’t write the code and you lack the background and context to quickly understand what’s going on when a problem occurs. The stakes are even higher when a problem is reported - meaning it’s already started to impact the business and the executives and your customers are not pleased.

Java Application Monitoring - How IT Ops can Diagnose Memory Leaks at Scale

Many server-side applications are written in Java and often process tens of millions of requests per day. Key applications in various domains like finance, healthcare, insurance and education are often Java-based. When these applications slow down or fail, they affect the user experience and in turn, reduce business revenue. Behind many web forms or form-like GUIs there will often be a Java application.

Netplan brings consistent network configuration across Desktop, Server, Cloud and IoT

We released Ubuntu 23.10 ‘Mantic Minotaur’ on 12 October 2023, shipping its proven and trusted network stack based on Netplan. Netplan is the default tool to configure Linux networking on Ubuntu since 2016. In the past, it was primarily used to control the Server and Cloud variants of Ubuntu, while on Desktop systems it would hand over control to NetworkManager.

4 Reasons Why NOCs Need Incident Response Automation

Incident response in a Network Operations Center (NOC) is cumbersome and time-consuming. There are many steps, many sources where incidents come from, and a long, long list of complexities involved. For instance, for incident response with a NOC, there’s the initial monitoring – Tier 1 functions of “eyes on glass” work of looking at alerts coming in and what they’re for, such as a security breach, performance issue, a hardware failure, among others.

Modernizing ITSM with ITIL 4: CMDB & service configuration management

The shift of the Information Technology Infrastructure Library (ITIL) 4 from process to value helps IT service management (ITSM) service providers demonstrate business value and adapt to change so they can meet business needs and customer expectations and thrive in an ever-changing technology landscape. However, in the face of an increasingly complex digital business environment, managing the diverse components of IT service operations can be challenging.