Operations | Monitoring | ITSM | DevOps | Cloud

Run Local LLMs on Mac to Cut Claude Costs

Part of the motivation for this post is how cloud API economics are shifting: Anthropic is moving large enterprise customers toward per-token, usage-based billing (unbundled from flat seat fees), which makes “always call the API” a moving cost line for teams at scale. A hybrid or local layer is one way to keep spend bounded while you still use premium models where they matter.

Top tips: When leaders leave, here's how to keep your IT systems stable

Top Tips is a weekly column where we look at what’s shaping the tech world and share practical ways teams can stay prepared for what’s next. This week, we’re focusing on a situation many teams underestimate—what happens to your IT systems when a key leader steps away, and how you can build stability that doesn’t rely on any one person. Some problems don’t show up when things are running smoothly. They show up when someone leaves.

The job is not to write code. It's to produce business value.

Most engineers can tell you exactly how many PRs they merged last quarter. Far fewer can tell you what any of it did for the business. The best engineering leaders can. They draw a straight line from their team's work to ARR: which reliability investment protected revenue, which migration unblocked a strategic customer, which operational improvement reduced churn. They lead with outcomes, not story points.

When agents orchestrate agents, who's watching?

You used to monitor services. Then you started monitoring AI calls inside services. Now your AI agent is spinning up other AI agents to complete tasks. Your old monitoring instincts need to evolve. This isn't hypothetical. Agentic architectures are already in production. Coding agents are calling search agents; orchestrators are spawning specialized sub-agents for retrieval, planning, and execution. Teams are shipping these systems faster than they're figuring out how to watch them.

What does using AI for post-mortems actually mean?

Everyone is using AI to help with post-mortems now. The pitch is obvious: post-mortems are time-consuming, the blank page is brutal, and AI is very good at producing structured, confident-sounding documents quickly. We're not here to push back on that. We've built AI into our own post-mortem experience, pulling your Slack thread, timeline, PRs, and custom fields together and giving your team a meaningful starting point in seconds. We think that's genuinely valuable, and the teams using it agree.

How it feels to run an incident with AI SRE

We've been building the broader incident.io platform for several years now, and one thing we've learned is that UX matters more here than almost anywhere else. When an incident fires, there's no room for poorly designed interfaces or fumbling through features you haven't touched in a while. The product has to be ergonomic: easy to pick up, easy to navigate, with the right things at your fingertips at exactly the right moment. We've put a lot of effort into this over the last 5 years.

How Recurring Instability Turns into Clinical Trial Delays

In pharma, reliability becomes an operational priority because research and trial work depend on systems performing consistently across different teams, locations, and conditions. Much of that work sits inside scientific workflows, remote sessions, and compute-heavy environments where behaviour can shift with configuration or load. When that consistency starts to break down, teams keep moving, but time is lost in small increments across the day.

Why Your PromQL Availability Query Returns Nothing When Services Are Healthy

Your SLI query shows 100% availability as No Data. Here's why PromQL returns empty results instead of zero — and the label-preserving fix. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.