Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Empowering organizations with reliable continuous delivery for Kubernetes applications

Managing application updates in production and ensuring the reliability of software releases in Kubernetes environments can be challenging. Small changes can sometimes lead to unforeseen issues in production. These unexpected problems, combined with the lack of scalability and the high costs associated with managing complex solutions, can be daunting.

The AIOPs and Automation Handshake: Managing the Modern IT Stack

To increase business agility, IT organizations are deploying dynamic, modern architectures enabled by virtualization technologies. That includes containers, elastic clouds, microservices, and virtual machines. If you are rethinking your IT stack, you must also reconsider its management. IT operational silos limit business velocity.

How to use AIOps to Modernize Without Compromise

While the Biden administration aggressively pushes federal agencies to modernize their IT infrastructures, ITOps managers are left wondering how to do so without making network management more complex than it already is. Modernization necessitates the addition of more tools, which can easily lead to tool sprawl and increase technical debt. Managers are already using multitudes of vendor-specific tools to monitor different devices and applications. The last thing they want is to add more.

Beyond Microservices: Miniservices, Macroservices, and the in between

Containerized microservices have been the gold standard for cloud computing since they replaced the monolith architecture over a decade ago. The flexibility, scalability, and velocity they enable for teams make them an obvious choice. Yet, a strict interpretation of one service for one function doesn’t quite serve everyone, especially when architectures get large. We’ll discuss how flexibility in service architecture might be the way to go.

How an APM Alternative Helps You Do Observability Right

Every software-driven business strives for optimum performance and user experience. Observability—which allows engineering and IT Ops teams to understand the internal state of their cloud applications and infrastructure based on available telemetry data —has emerged as a crucial practice to help engage this process. For years, application performance monitoring (APM) was the de facto practice and tooling that organizations have used to keep tabs on their critical systems.

How the Prometheus community is investing in OpenTelemetry

Goutham Veeramachaneni, a product manager at Grafana Labs, and Carrie Edwards, a senior software engineer at Grafana Labs, are both contributors to the Prometheus open source project. This post, which they wrote together, was originally published on the Prometheus.io blog in March 2024. The OpenTelemetry project is an observability framework and toolkit designed to create and manage telemetry data such as traces, metrics, and logs.

Agnostic AI: How to Not Choose an AI Provider

Artificial Intelligence (AI) technologies are evolving at breakneck speed. Today's cutting-edge model may become obsolete tomorrow. This rapid evolution, while exciting, presents a challenge – how to leverage the current best AI capabilities without being tied to a single model or provider. At InvGate, we apply a strategy that we call “agnostic AI” (as in platform agnostic).

How Incidents Foster Leadership

To become battle-tested, you need to go through battles, not just read books or mentor newcomers. Both are helpful but the stakes are low. On the other hand, high stake jobs, such as running a big project or managing a team, are hard to get when you lack experience. So how can we solve this dilemma? Enter incident response.

An SRE's Most Important Skill? Communication

I wish someone had told me that I shouldn’t hop between frameworks. Just like learning four programming languages in your first year, in my experience spending time content switching as a beginner is wasted effort. If I’d spent a solid year learning how to deploy services on AWS, then when it was time to learn Azure, I’d see more similarities than differences and find it a lot easier to pick up a second public cloud.