Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

How to write annotations in Kubernetes with JSON for Datadog Autodiscovery | Datadog Tips & Tricks

Pod annotations in Kubernetes with invalid JSON syntax can prevent Datadog Autodiscovery from detecting integrations, resulting in missing metrics and gaps in monitoring. Watch this video for a step-by-step process to write annotations: Note: This video focuses on Datadog Autodiscovery v2 syntax.

SSL/TLS Certificate Lifetimes to Reduce to 47 Days

Last year it was widely reported that the CA/Browser Forum had voted to significantly reduce the lifespan of SSL/TLS certificates over the next 4 years, with a final lifespan of just 47 days starting in 2029. The first reduction will come into action in a few weeks, on March 15th 2026, accelerating the need for organizations to automate their monitoring and renewal processes around certificate expiry.

Improve performance and reliability with APM Recommendations

SREs and application developers rely on telemetry data to understand and improve their systems. As organizations scale and evolve, those systems generate an ever-growing volume of metrics, logs, and traces. But more data alone does not make it easier to improve performance or reliability: Identifying meaningful optimizations still requires careful investigation and analysis.

NIS2 and CER Serve a Broader Purpose Than Cybersecurity - The 5 Biggest Risks You Need to Address Now

The European directives NIS2 (Network and Information Security Directive 2) and Critical Entities Resilience (CER) Directive have rapidly sharpened the conversation around digital resilience. While many organizations initially viewed these directives as an extension of their cybersecurity obligations, it is becoming increasingly clear that much more is at stake. These directives require a strategic transformation in how organizations manage risks, processes, and responsibilities.
Sponsored Post

How to improve your Crash Free Users score in minutes

If you're reading this blog, you likely already know the importance of quality software. But with the overwhelming number of metrics that can be monitored and improved, development teams are struggling with what metrics they should prioritize to have the most significant impact. The Crash Free Users score in Raygun is a perfect place for development teams who care about software quality to focus their efforts. It tells you what percentage of users didn't encounter a crash or error while using your software and is an ideal north star to gauge the overall quality of your software.

How Okta keeps 99.99 percent uptime with #datadog

How do you maintain 99.99 percent uptime across thousands of Kubernetes hosts and multiple cloud providers? Okta engineers explain why observability is critical to keeping authentication and authorization services running at scale. Watch how Okta uses Datadog to bring metrics, logs, and traces into a single view, speed up root cause analysis, and reduce time to mitigation while controlling costs.

Web Performance Metrics: Why INP Is Your Most Practical UX Performance KPI

Every developer has seen this scene: a user clicks a button, nothing happens, they click again—still nothing—and by the third frustrated tap, three overlapping modals explode onto the screen. The page wasn’t slow to load. It was slow to respond. This highlights the importance of perceived performance—how fast and responsive a website feels to users—which can shape user satisfaction regardless of actual load times.

Top 15 Application Performance Metrics for Developers and SREs in 2026

Every application tells a story of user intent, system behavior, and business impact. To truly understand how your application performs, you need to go beyond logs and errors. You need metrics that provide actionable visibility across your stack. Application performance metrics are the foundation for delivering high-quality digital experiences, and they empower DevOps teams, developers, engineers, and site reliability engineers (SREs) to respond faster, scale smarter, and continuously improve.

Redefining Application Management Services - the AIOps Way

For years, Application Management/Maintenance Services (AMS) have been the go-to solution for IT leaders trying to keep their business applications stable and running. The AMS pitch was simple: Hand over your apps to us, and we’ll manage and maintain them for you! And for a long time, that model has delivered promising results. It allows internal teams to focus on innovation while service providers handle the operational heavy lifting.

Debugging AI Agents in Production Without Losing Your Mind

AI agents are powerful, but debugging them in production is hard. Non-deterministic behavior, LLM latency, and token costs create observability challenges that traditional monitoring tools don't address. In this webinar, engineers from Inkeep and SigNoz walk through how Inkeep monitors its AI agent framework in production using OpenTelemetry-native observability.