Operations | Monitoring | ITSM | DevOps | Cloud

%term

Debugging Python Cold Starts with Sentry Profiling and improving our P99 latency by several seconds

At Sentry, we don't just build debugging tools for developers—we use them ourselves. This story demonstrates how we leveraged our own platform to solve a mysterious performance issue that was causing significant latency spikes in our critical infrastructure which is used in nearly every backend request.

AppNeta Feature Highlight: Monitoring Policies

This year, we’ve been working hard to introduce monitoring policies, a new feature designed to simplify and streamline the monitoring configuration process. This set of features is a direct result of collaborating closely with our customers to understand their unique challenges. We've listened to your feedback and are excited to deliver a solution that makes monitoring more efficient and user-friendly than ever before.

How to Achieve SOC-2 Compliance on AWS

SOC-2 is a critical framework that ensures the security, availability, integrity, confidentiality, and privacy of systems and data. It is particularly important for organizations handling sensitive customer information. If you are using any cloud vendor, especially AWS, and aiming for SOC-2 certification then this article is for you. We will provide insights into how AWS supports SOC-2 compliance, and also go through a comprehensive roadmap and practical strategies for meeting these essential standards.

Product Update: Introducing User Groups for InfluxDB Cloud Dedicated

We are excited to announce the launch of User Groups, a major update that facilitates enhanced security through access control in InfluxDB Cloud Dedicated. This new feature allows for more granular access management by limiting limited access accounts. Giving customers more access control helps them implement PoLP (“Principle of Least Privilege”) for improved security.

Smooth Sailing to Amazon EKS: How Cortex Scorecards Simplify Your Kubernetes Migration

Migrating a Kubernetes cluster from a self-hosted environment to Amazon EKS (Elastic Kubernetes Service) offers considerable operational benefits—but this transition also brings challenges in reliability, compliance, and observability. By leveraging Cortex Scorecards, teams gain clear insights and structured benchmarks to simplify and improve the migration process. Here’s how Cortex Scorecards can make your EKS migration efficient and resilient.

Create ServiceNow tickets from Datadog alerts

ServiceNow is a popular IT service management platform for recording, tracking, and managing a company’s enterprise-level IT processes in a single location. In addition to helping you manage your ServiceNow CMDB, Datadog also integrates with ServiceNow IT Operations Management (ITOM) and IT Service Management (ITSM), enabling you to automatically create and manage ServiceNow incidents and events from the Datadog platform.

Cost-Effective Strategies for Kafka Resource Management

Running Kafka at peak efficiency doesn’t come cheap. But with some smart tweaks, it’s entirely possible to keep costs down while making sure everything flows smoothly. The key is to balance your resource usage across CPU, memory, and storage to get the most bang for your buck. Let’s dive into some strategies that will help you stretch those resources, streamline your Kafka setup, and avoid breaking the bank.

Site Reliability Engineer's Guide to Black Friday

It’s gotten to the point where Black Friday reliability prep has to start on…well Black Friday. This year, 32% of consumers in the US claimed that they were going to start their holiday shopping in July-October. Plus, Black Friday isn’t the only day eCommerce businesses have to worry about, now we have Cyber Monday, Travel Tuesday, and the thousands of Prime Days from Amazon.