Operations | Monitoring | ITSM | DevOps | Cloud

Balancing Reliability at the Crypto-Finance Frontier with Brian Shaw (Uphold)

Sylvain Kalache sits down with Brian Shaw, Senior Engineering Leader at Uphold, to explore the reliability challenges that arise when operating at the intersection of traditional finance and crypto markets. Brian shares how unexpected market events can create massive traffic spikes, how their platform architecture and Kubernetes setup help them stay resilient, and why Uphold's transparency and regulatory approach make them both trustworthy and a high-profile target.

APM best practices: Dos and don'ts guide for practitioners

Application performance management (APM) is the practice of regularly tracking, measuring, and analyzing the performance and availability of software applications. APM helps you get visibility into complex microservices environments, which can overwhelm site reliability engineering (SRE) teams. The generated insights create an optimal user experience and achieve desired business outcomes.

Aiven Surpasses $100M ARR: A Story of Community, Open Source, and What's Next

Today is a special day at Aiven. We are incredibly proud and humbled to announce that we have officially surpassed $100 million in annual recurring revenue. This is more than just a number on a dashboard. It’s a powerful milestone that reflects the trust of thousands of developers, the incredible innovation of our customers, and the relentless dedication of our global team—the Aiven crabs.

The Business Case for Network Automation: Cost Savings and Efficiency

Let’s get real: the cost of not automating your network operations is probably already showing up on your P&L, and not in the column you like. Manual configuration changes, ad hoc backups, and frantic compliance prep aren’t just operational headaches, they’re quiet killers of budget flexibility and scale readiness. Network automation is no longer a “nice to have” for companies with massive IT budgets or unicorn-level engineering teams.

How Agentic AI is Reengineering Advertising Revenue Operations: Workflows to Workforce

Digital advertising is experiencing a shift similar to manufacturing's industrial revolution. AI is automating routine tasks, freeing up human teams for higher-level strategic work, moving us from manual campaign management to automated systems where humans design the strategy rather than execute every detail. This represents the biggest operational change since programmatic advertising began.

From Zero to Dashboard in 10 Minutes with Telegraf, InfluxDB 3, and Grafana

In this tutorial, let’s walk through setting up a modern TIG stack in 10 minutes. TIG stands for three popular open source tools that complement each other: Telegraf, InfluxDB 3, and Grafana. They are often used to collect, store, and visualize time series data from servers, containers, APIs, or even IoT devices. We will be using a read-to-use GitHub repository that includes.

Top Automation Use Cases for IT (in End User Computing)

As digital transformation continues to reshape the business landscape, IT teams are under more pressure than ever. Organizations demand faster service, always-on support, and seamless user experiences – all while IT budgets remain stagnant or even shrink. Organizations urgently need solutions that help them keep up without burning out their teams or inflating costs. This is where IT automation becomes essential.

IT Event Console: Centralize Logs, Correlate Alerts, and Detect Incidents

When you’re just starting out, you might picture yourself managing your IT infrastructure like Tom Cruise in Minority Report—key information projected in front of you, predicting events before they happen, controlling everything at the speed of thought with cinematic gestures on some kind of holographic computer.

Introducing DNS Monitoring - Stay Ahead of DNS Issues Before They Impact You

We’re excited to announce a powerful new addition to your monitoring toolkit: DNS Monitoring is now available on UptimeRobot! DNS (Domain Name System) is a core component of internet functionality. When DNS records are misconfigured, hijacked, or simply expire, they can lead to serious outages, broken email services, or even security risks. That’s why we’ve introduced DNS Monitoring – to help you stay in control of your domain’s health at all times.

LangChain Observability: From Zero to Production in 10 Minutes

LangChain apps are powerful, but they’re not easy to monitor. A single request might pass through an LLM, a vector store, external APIs, and a custom chain of tools. And when something slows down or silently fails, debugging is often guesswork. In one instance, a developer ended up with an unexpected $30,000 OpenAI bill, with no visibility into what triggered it. This blog shows how to avoid that using OpenTelemetry and LangSmith. With this setup, you’ll be able to.