Operations | Monitoring | ITSM | DevOps | Cloud

February 2025

Observability for your NodeJS AWS Serverless Applications

Hi there, and welcome to the first video in this series on observing AWS serverless applications with Datadog. In this video, you’ll learn how easy it is to get started observing your serverless NodeJS applications using Datadog and the AWS CDK. You’ll also look at how you can use the Datadog console to diagnose latency issues and errors inside your application. You’ll walk away with an understanding of how to instrument your Lambda functions with the AWS CDK, as well as practical steps you can take to debug your applications.

Native AWS Integrations with AutoDiscovery

For developers, the main quest is building and scaling their applications—not struggling with complex monitoring setups. Yet, observability in cloud-native environments is essential, and configuring monitoring for AWS services has traditionally been a complex and manual process. Developers had to set up Firehose streams, CloudWatch metric streams, and log subscriptions, all while ensuring continuous maintenance for new instances, turning observability into an unwelcome side quest.

SRE Challenges & APM Solutions

Site Reliability Engineers (SREs) face constant challenges as cloud environments and microservices grow more complex. Performance issues often go unnoticed until they escalate, leading to downtime and disruptions. With Site24x7 APM, you can stay ahead of issues before they impact your business. Our Application Performance Monitoring (APM) solution provides real-time insights, predictive analytics, and deep visibility across your entire IT ecosystem—helping you.

How to use APM data to improve your CI/CD pipeline performance

Agile production has become the norm for software development cycles. The backbone for such a fast-paced landscape is the continuous integration and continuous delivery (CI/CD) pipeline. But merely depending on the CI/CD pipeline isn’t enough, even though the automated workflows give you a competitive edge. The pipeline needs to be optimized to function at its best. This is where monitoring your applications within the pipeline can be a game-changer.

Deeper Trace Analytics - Analyze Root & Entry Spans with Ease

Debugging distributed systems can often feel like searching for a needle in a haystack. When issues arise, engineers need faster ways to pinpoint critical spans within their traces. With our latest Deeper Trace Analytics update, SigNoz now enables powerful filtering for root and entry spans—making it significantly easier to analyze and debug distributed traces.

Out-of-box OpenTelemetry-powered Kafka & Celery monitoring

Messaging queues power modern distributed systems, handling background tasks, event-driven architectures, and real-time data streaming. However, debugging issues in Kafka and Celery queues has traditionally been a black box, with limited correlation between message producers, consumers, and broker metrics. With OpenTelemetry-powered Kafka & Celery monitoring, SigNoz introduces the industry's first fully integrated observability solution for messaging queues powered by OpenTelemetry.

eG Innovations' AIOps-Powered APM

I recently wrote about how eG Innovations AIOps-powered monitoring benefits those working with Digital Workspaces – today I’ll cover how those same AIOps (Artificial Intelligence for IT Operations) capabilities also make the eG Enterprise platform a leader in the APM (Application Performance Monitoring) space. The eG Enterprise platform is equipped with capabilities for automated corrective actions, event-based triggers, and remote-control functionalities.

Deeper Trace Analytics - Quickly search through all spans, entry spans and root spans

Debugging distributed systems can often feel like searching for a needle in a haystack. When issues arise, devs need faster ways to pinpoint critical spans within their traces. With our latest Deeper Trace Analytics update, we now enable powerful filtering for root and entry spans — making it significantly easier to analyze and debug distributed traces.

Deeper Trace Analytics - Analyze Root & Entry Spans with Ease | SigNoz Launch Week 3.0 Day 4

Debugging distributed systems can often feel like searching for a needle in a haystack. When issues arise, devs need faster ways to pinpoint critical spans within their traces. With our latest Deeper Trace Analytics update, we now enable powerful filtering for root and entry spans — making it significantly easier to analyze and debug distributed traces.

Traces Without Limits - Load a Million Spans with SigNoz

Observability at scale is challenging—especially when dealing with high-volume distributed traces. Traditional tracing tools struggle with large traces containing thousands of spans, often leading to sluggish UIs and an unmanageable debugging experience. Most tracing tools we checked have a limit on the maximum spans they can load for a single trace. But with SigNoz, we’ve redefined what’s possible.

Stop Losing Sales! The Biggest UX Friction Traps in eCommerce

Friction in eCommerce is a silent sales killer. When customers hit roadblocks—slow pages, confusing layouts, unnecessary steps—they ditch their carts and move on. The problem? Many online stores create friction without even realizing it. But here’s the deal: Not all friction is the same. Some comes from clunky tech, while other issues stem from poor design choices or pushy sales tactics.

OpenTelemetry-Powered Infrastructure Monitoring - SigNoz Launch Week 3.0 Day 1

Today, we’re excited to announce a much-awaited feature in SigNoz: Infrastructure Monitoring. With our latest OpenTelemetry-powered Infra Monitoring, we bring you a native OpenTelemetry experience that seamlessly integrates infrastructure metrics with application performance data.

Out-of-the-box OpenTelemetry-powered Kafka & Celery monitoring | SigNoz Launch Week 3.0 Day 3

Today, we are excited to announce OpenTelemetry-powered messaging queue monitoring in SigNoz. Debugging issues in Kafka and Celery queues has traditionally been a black box, with limited correlation between message producers, consumers, and broker metrics. With our messaging queue monitoring, teams can correlate Kafka broker metrics with OpenTelemetry spans, enabling deep insights into consumer lag, throughput, drop rates, and performance bottlenecks.

What is Platform Engineering and Why is it Important?

Without the right frameworks in place, software development often feels like managing a project with too many moving parts and no cohesive plan. A good solution to this problem would be having a unified platform that streamlines processes, integrates tools, and provides consistency across the development lifecycle. That’s what platform engineering offers—it simplifies the complexities of software development by making it easier to build, deploy, and maintain digital infrastructure.

OpenTelemetry-Powered Infrastructure Monitoring

Today, we’re excited to announce a much-awaited feature in SigNoz: Infrastructure Monitoring, built natively on OpenTelemetry. Infrastructure monitoring is a critical aspect of modern observability. Without proper visibility into your infrastructure resources, troubleshooting issues, optimizing costs, and maintaining performance become challenging.

Migrating to Amazon DaaS - Part 1 - How to leverage AIOps monitoring during a migration to Amazon WorkSpaces or AppStream 2.0

If you are considering or planning a migration to Amazon Workspaces or AppStream 2.0, you’ll also want to consider how you integrate effective monitoring into your planning and execution – this will not only save you time and money long term but will also help you measure and achieve success.

Top 10 challenges for SREs and how to overcome them with APM tools

According to Google, "SRE is what you get when you treat operations as a software problem.” The role of site reliability engineers (SREs) is evolving rapidly to ensure optimal application performance in today's evolving IT environments. SREs are expected to provide proactive and predictive solutions for the issues arising from managing such environments. A Gartner report even suggests that by 2025, 70% organizations will be depending on SRE practices to ensure operational resilience.
Sponsored Post

Top 10 .NET exceptions (part one)

Exception handling is essential to.NET development, but not all exceptions are equal. Some, like NullReferenceException, surprise developers with unclear stack traces and production crashes. Others, such as MySQLException or HttpRequestException, often point to issues like resource mismanagement or network failures. At Raygun, we've worked with teams around the world to monitor and fix software issues, giving us deep insight into how exceptions occur and how to handle them effectively.

Introducing Raygun CLI: Level-up your error tracking workflow

Raygun CLI is a powerful command-line interface tool designed to enhance the developer experience when working with Raygun’s error tracking and performance monitoring platform. With this tool, we bring Raygun’s features directly to your terminal, making it easier to integrate some important elements of Raygun Crash Reporting and error tracking into your development and CI/CD workflow. We are excited to announce the release of version 1.0.0 of Raygun CLI.