Engineering teams that manage high-volume log sources, such as content delivery network (CDN) edges, streaming platforms, and authentication systems, often have to make a difficult retention tradeoff. Indexing every event keeps logs searchable during investigations, audits, and postmortems, but it can make long-term retention expensive.
If a problem reached you, it means nobody else could solve it. That makes it worth documenting — right then, while you're in it. Six months from now, your team will thank you. Watch the full IT Leadership Lab session.
Incident response rarely fails because teams lack tools. More often, it fails because those tools are disconnected when pressure is highest. A monitoring system detects the issue. An ITSM platform holds the incident record. Engineers coordinate in chat. A bridge is created manually. A cloud team checks infrastructure events. Security teams review detections. Leaders ask for updates. Meanwhile, responders are jumping between systems, chasing context, and trying to make decisions quickly.
SRE Lead Ricard Bejarano (Cisco) and Jorge Lainfiesta (Rootly) sit down to talk about a recent intermittent incident that had the team scratching their heads.
AI looks great in a demo. The real test is production. In this week's Zero Ticket Minute, Ian explains why success isn't about what AI can do. It's about what it can reliably resolve.
This release roundup brings together Icinga Web SSO v1.0.0, Icinga Director v1.11.9, and Icinga vSphere Integration v1.8.4. It introduces OpenID Connect single sign-on for Icinga Web and includes compatibility updates for the upcoming Icinga PHP Library 1.0.0 release.
Stop guessing whether your repos meet your branch policies. Start knowing. In this Feature Friday, Senior Engineering Manager Gabriel walks through Cortex's new native support for GitHub branch rule sets and how to use them in scorecards to enforce consistent policies across all your repos. What you'll see: Questions? Reach out to your CSM or drop a comment below.