Operations | Monitoring | ITSM | DevOps | Cloud

The Future of FinOps: Engineering, Applications & Cloud Cost Accountability

In this episode of the FinOps on Azure Podcast, Michael Stephenson is joined by Ben DeBow, Founder and CEO of Fortified, to discuss the next evolution of FinOps and why cloud cost management needs to move beyond dashboards, reporting, and allocation. Ben shares insights from years of helping enterprises optimize cloud spend and explains why the biggest savings opportunities are often hidden inside applications, workloads, and engineering decisions—not infrastructure.

Top Considerations When Evaluating DCIM Vendors

Choosing a Data Center Infrastructure Management (DCIM) platform is one of the more consequential decisions a data center team will make. Get it right, and you gain an accurate digital twin of your physical infrastructure, a single source of truth across teams, improved operational visibility, and a platform for planning, reporting, and automation. Get it wrong, and you risk a failed deployment, a platform that doesn't fit your needs, or a shelfware investment that's hard to justify renewing.

Running the OpenTelemetry Collector as a Lambda

The OpenTelemetry Collector is usually deployed as a long-running process: a sidecar, a DaemonSet, an EC2 instance, a docker container on my computer. It sits there listening for telemetry. That's fine when I want to send telemetry all day, but not when telemetry is rare. Like right now, when I have an agent defined on AgentCore, and it runs a few times a week maybe. Or my website that hardly sees any traffic. Can I run the OpenTelemetry Collector as a Lambda function?

Incident Prevention & Incident Assistant Demo - The best incident is one that never happens

The best incident is one that never happens. The BigPanda team recorded a live demo of the AI Incident Prevention & AI Incident Assistant as part of ITSM Week, hosted by the Service Desk Institute. ITSM teams are measured by how effectively they prevent disruption. Yet many teams still spend too much time reacting to noisy, low-context incidents after impact has already begun. Watch this on-demand session to learn how leading organizations are moving beyond manual firefighting to autonomous operations with Agentic AI.

Search and act across Datadog to resolve issues faster with Bits Chat

Finding the right information across dashboards, monitors, and telemetry sources takes time, even for experienced engineers. When something breaks, it often means figuring out where to start, rebuilding queries, and jumping between metrics, logs, and traces before you can take action. The challenge isn’t a lack of data but the effort required to surface the right information at the right moment.

Works on my machine: how we use AI to reproduce reported bugs

Sentry’s SDK teams maintain and support SDKs for a vast ecosystem of languages and frameworks. See our release registry for a source of truth. We’re currently at 159 published packages across the entire ecosystem. If you use it, we probably support it. All of these SDKs are open source and have their own GitHub repositories that we maintain on a daily basis. And like any other open source project, we get tons of bug reports and issues on these.

HAM Audit: How InvGate Asset Management Helps You Pass

A Hardware Asset Management (HAM) audit is a formal check of whether your hardware inventory reflects physical reality. It covers what devices exist, where they are, who has them, what state they're in, and how retired assets were documented out of the system. Most organizations don't fail HAM audits because their IT teams are negligent.

A practical guide to standardizing app delivery without rebuilding everything internally

Standardize the route from code to production. Everything else is a team decision, not a platform problem. Most app delivery problems do not start with bad engineering. They start with too much variation. One team provisions environments manually. Another keeps deployment notes in a wiki. A third has a staging setup that only one engineer understands. Security reviews happen late because the platform does not make the safe path obvious.