Operations | Monitoring | ITSM | DevOps | Cloud

Olly for SREs: 3 ways I actually use it in production

There’s a moment after an alert where you’re not fixing anything yet. You’re trying to answer a much simpler question: Is it actually down? Sometimes it’s obvious. Sometimes it’s 20 alerts at once with no clear starting point. Sometimes it’s a small upstream degradation that might cascade. Sometimes it’s just a spike that resolves on its own. That first phase is orientation. Is the signal real or transient? Is it isolated or spreading? Root cause or symptom?

The data context gap: an evaluation guide for agent-ready infrastructure

Why do AI agents that look brilliant in a sandbox fail the moment they hit production? For platform leaders, the answer is a lack of environmental parity: the ability to interact with the exact data state and service topology where the actual bugs live. When an agent attempts to modify a schema, optimize a query, or reproduce a bug without access to the real-world data state, it hits the Data Context Gap.

Expanding Uptime Monitoring Down The Stack: ICMP Monitors Are Now Available In Checkly

When we started building Checkly's uptime monitoring suite, the goal was to give engineering teams complete visibility across every layer of their stack, from application down to network, in one place. URL, TCP, DNS, and Heartbeat monitors covered a lot of that ground. But one fundamental piece was missing: the ability to simply ping a host and know if it's reachable.

When Your Plant Talks Back: Conversational AI with InfluxDB 3

No one wants to stare at a plant and guess if it needs water. It’s much easier if the plant can say, “I’m thirsty.” A few years ago, we built Plant Buddy using InfluxDB Cloud 2.0. The linked article is still a great guide for cloud-first IoT prototyping as it shows how quickly you can connect devices, store time series data, and build dashboards in the cloud with the previous version of InfluxDB. But this time, the goal was different.

Bring Clarity and Confidence Back to Ops: How Trustworthy Guidance Sets a New Standard

For years, enterprises have chased the promise of artificial intelligence as a remedy for growing operational complexity. It seemed logical that if environments were expanding faster than teams could keep up, smarter models could fill the gap. But early deployments of generic AI proved a difficult truth. Intelligence alone does not create operational clarity. It does not guarantee safety.

Context is the New Currency: Building a Context-aware Enterprise with Agentforce

Corporate investment in Generative AI is outpacing value realization. While Large Language Models (LLMs) possess vast general reasoning capabilities, they suffer from a critical blind spot: they are pre-trained on the public internet, yet completely blind to your enterprise reality. This context gap renders even the most advanced models ineffective, forcing them to guess (hallucinate) rather than reason based on your specific business rules.

Say Goodbye to ZooKeeper

Automated, Zero-Downtime KRaft Migrations Now Available on Aiven The Apache Kafka ecosystem has been steadily moving toward a simpler, more scalable architecture with KRaft (Kafka Raft), leaving ZooKeeper behind. In March 2025, Kafka 4.0 dropped support for ZooKeeper entirely. Since June 2025, all new Aiven for Apache Kafka services have been deployed with KRaft by default, allowing our users to benefit from faster partition scaling and simplified cluster management.

How AI Agents Communicate: Understanding the A2A Protocol for Kubernetes

Since the rise of Large Language Models (LLMs) like GPT-3 and GPT-4, organizations have been rapidly adopting Agentic AI to automate and enhance their workflows. Agentic AI refers to AI systems that act autonomously, perceiving their environment, making decisions, and taking actions based on that information rather than just reacting to direct human input.

Smarter Postgres Monitoring: Compare Queries, Spot Unused Indexes, and Diagnose Waits

This is a guest post from Adrian Tan. Over recent months, we’ve been steadily improving PostgreSQL monitoring in Redgate Monitor, with a singular focus: to help Postgres users diagnose performance problems faster, with less manual investigation. The latest updates and new features tackle this problem in a few different ways.