Operations | Monitoring | ITSM | DevOps | Cloud

So you need to add microcontrollers to your fleet: now what?

Your Ubuntu Core fleet is running beautifully. OTA updates roll out in minutes. Every device is strictly confined, cryptographically attested, and carrying a 10 to 15 year long term support (LTS) commitment. The operational team sleeps soundly. Then the product roadmap meeting happens. The industrial floor needs vibration sensors on every motor. The smart building needs temperature nodes in every room. The cold chain system requires dozens of low-power Bluetooth tags. And someone just said the words.

Measuring engineering organizations in the age of AI

Engineering leadership is in the middle of a real transition, and most of the leaders I talk to know it. AI has reshaped how software gets built quickly enough that the operating models many of us spent a decade refining no longer fit cleanly, and there is a great deal of serious work happening across the industry to figure out how these models should evolve. The teams I find most impressive right now are the ones treating their operating model as an open question rather than a settled one.

How to land on the right side of the AI divide

AI changed how code gets written before it changed how code gets operated. Generation accelerated; the downstream controls that turn that output into reliable, secure software at a reasonable cost did not keep pace. The result is elevated risk, distributed unevenly across engineering organizations. A recent survey explains why the distribution is so uneven.

Should platform, SRE, and security merge into one function?

Platform, SRE, and security are three distinct functions in modern engineering orgs, each shaped by a different problem. SRE was the operations function's answer to scale: how to keep systems reliable when the systems get big. Platform answered a different problem: how to let developers ship without becoming infrastructure experts. Security drew the line on what could safely reach production.

Agent governance starts with the service catalog you already run

Last month, an AI agent running inside Cursor wiped PocketOS's entire production database, including its backups, in roughly nine seconds. The agent found an API token in an unrelated file, originally created for managing custom domains, and used that token to execute the deletion. The backups sat inside the same blast radius as the database the agent was operating against. Nine months earlier, a Replit AI agent had done the same thing to a SaaStr database during a designated code freeze.

The audit-ready engineering org

Two weeks before the audit, the Slack messages start. Get me a screenshot of this. Can you screenshot the CI/CD logs? Can you add the artifact names that were deployed to production and when, and when the incident happened? Senior engineers stop shipping. A spreadsheet appears. The product roadmap goes on hold while four people chase down ownership data and evidence that should have existed all along. This fire drill is the symptom of an operating model problem.

What Architecture Ensures Long-Term Scalability in a Rails-Based B2B Platform?

Scalability is not a feature you add later; it is a choice made at the architectural level from day one. A Rails-based B2B platform that handles growing clients, data, and transactions without slowdowns or costly rewrites is built on a modular design, clear domain boundaries, background job processing, caching, and a database strategy that supports load distribution and horizontal scale. Get these foundations right, and you stay in control of growth instead of reacting to problems after they appear.

Your platform team's name is holding it back

When you stood up your platform team, you probably spent more time on the org chart than on what to name it. Reporting lines, headcount, scope of the first charter, those felt like the real decisions. The name was administrative. Something to put in Slack and the directory and forget about. That was the most consequential decision you made. The name you give a platform team isn't just branding. It's a scope declaration.

Context Engineering: How to Manage AI Context at Scale

Context engineering is the practice of managing the information an AI model sees (documents, tool outputs, memory, and structured metadata about the systems it reasons over) so it can make accurate decisions inside a real engineering organization. Most engineering teams have access to the same AI coding agents: Claude, GPT, Gemini, the major variants everyone is shipping. The model is no longer the differentiator.

Ask Cortex anything, right from Slack

The Monday morning thread. Someone asks who owns checkout-service. Someone else asks what changed in the Production Readiness Scorecard last week. A third person wants to know if the Kubernetes migration is blocking the launch next Thursday. The answers exist. They live in Cortex. But getting them into the thread means someone stops what they're doing, opens a tab, finds the data, and pastes it back. By the time they do, the conversation has moved on.