Operations | Monitoring | ITSM | DevOps | Cloud

Latest Blogs

Featured Post

Incidents are lessons, not failures

Delivering digital operations excellence - DevOps, incident management, and keeping organisations running - is a constant challenge. As customer digital expectations rise, so do the complexities of the tech stack and cloud services integrations. But to insist on 100% uptime and rush through incident management without taking learnings into account creates a poor culture that can damage the ability of the DevOps team. This is not how a business creates resilient infrastructure and high-performing teams.

Don't Buy the Hype: The GenAI Power You Already Have

Your database is more powerful than you think. Learn how built-in vector capabilities can power your GenAI applications and save you from the hassle of adopting a new database. The heart of Generative AI (GenAI) workloads rely on the ability of computers to categorize and understand the world's data (images, sounds, text) as numerical representations called vectors. This is achieved through a process called "embedding," where a model translates the data into vectors.

Why Your Telemetry(Observability) Pipelines Need to be Responsive

At Mezmo, we consider Understand, Optimize, and Respond, the three tenets that help control telemetry data and maximize the value derived from it. We have previously discussed data Understanding and Optimization in depth. This blog discusses the need for responsive pipelines and what it takes to design them.

Kubernetes 1.31 - What's new?

Kubernetes 1.31 brings a plethora of enhancements, including 37 line items tracked as ‘Graduating’ in this release. From these, 11 enhancements are graduating to stable, including the highly anticipated AppArmor support for Kubernetes, which includes the ability to specify an AppArmor profile for a container or pod in the API, and have that profile applied by the container runtime.

How Network Observability Helps Lay the Foundation of Autonomous IT Operations

We often hear the term "observability" in the context of DevOps and how SREs use telemetry data. Collecting and analyzing this telemetry data is a vital first step to a successful autonomous IT operations strategy. Observability can help you find out about problems in your system you didn’t know you had—and before your users are impacted—by giving you new visibility that your monitoring systems don’t provide. But any observability initiative must also include network observability.

Leveraging AI for Efficient On-call Scheduling

Regardless of industry specifications, creating and maintaining a highly functional incident management process is crucial for organizations of all sizes. The various potential applications of Generative AI in this process can significantly enhance the efficiency, accuracy, and speed of incident detection, analysis, and resolution. GenAI can be utilized across all stages of the incident management process, including preparation, response, communication, and learning.

How our data team handles incidents

Historically, data teams have not been closely involved in the incident management process (at least, not in the traditional “get woken up at 2AM by a SEV0” sense). But with a growing involvement of data (and therefore data teams) in core business processes, decision making, and user-facing products, data-related incidents are increasingly common, and more important than ever.

MongoDB use cases for the telecommunications industry

A trusted database is fundamental to the smooth and secure operation of telecommunications services:, from network management and customer service to compliance and fraud prevention. MongoDB is one of the most widely used databases (DB Engines, 2024) for enterprises, including those in the telecommunications industry. It provides a sturdy, adaptable and trustworthy foundation. It also safeguards sensitive customer data while facilitating swift responses to rapidly evolving situations.