Operations | Monitoring | ITSM | DevOps | Cloud

What is Gremlin?

Today’s technology leaders are facing a reliability gap. Customers expect their apps to be fast and available. But with Devops and distributed systems driving more speed and complexity, it’s harder than ever to find and fix the reliability risks that can impact customer experience–before it’s too late. To close the Reliability gap, we need a reliability strategy. One that’s proactive, measurable, built-in and automated. We need a reliability platform.

3 Ways to Sell DORA to Your Boss

3 Ways to Sell DORA to Your Boss. If you've bought into the concept of DORA, and now it's time to get your boss on board, these three tips will help you succeed. Just remember: Give Sleuth a try and see how we give teams actionable insights on how to improve, no-code automations to instantly ship improvements, and metrics to measure their impact — all in a way that both managers and developers love.

400x deploy frequency? One team's DORA success

Is 400x deploy frequency possible? One team achieved it with the DORA philosophy and metrics. It doesn't happen overnight, but it's possible if you commit to it. Nathen Harvey shares a DORA success story. Give Sleuth a try and see how we give teams actionable insights on how to improve, no-code automations to instantly ship improvements, and metrics to measure their impact — all in a way that both managers and developers love.

Gremlin for DORA compliance: how financial services firms build digital resilience-and prove it

The Digital Operational Resilience Act (DORA) is set to significantly impact the financial sector. Coming into full effect in 2025, this EU regulation will set new standards for information and communications technology (ICT) risk management. In this landscape, how can financial firms ensure they’re not only compliant, but also operationally resilient?

What Is the True Cost of Downtime for Businesses?

The financial and operational ramifications of downtime have become increasingly pronounced over the past seven years. In 2014, Gartner predicted that downtime costs organizations an average of $300,000 per hour. However, recent statistics lie in sharp contrast to this 6-figure estimate, with 44% of organizations now counting their hourly downtime costs at over $1 million - exclusive of the ensuing penalties or legal fees.

Canonical announces supported solution for Apache Spark on Kubernetes

Today, Canonical announced the release of Charmed Spark – an advanced solution for Apache Spark® that provides everything users need to run Apache Spark on Kubernetes. Apache Spark is suitable for use in diverse data processing applications including predictive analytics, data warehousing, machine learning data preparation and extract-transform-load (ETL).

Adding automation to monitoring: Azure troubleshooting simplified

The transition from traditional on-premises IT infrastructure to the public cloud has brought substantial relief to IT decision-makers and sysadmins. Since many organizations use Microsoft Windows as their preferred operating system, Microsoft Azure has become the public cloud provider of choice automatically owing to a familiar GUI and Active Directory sync.

Reproducing and testing distributed system failures with xk6-disruptor

Distributed systems, such as modern microservices-based applications, are highly scalable, but also highly complex. Dependencies and unexpected interactions between services are a common cause of incidents, and these incidents are also notoriously hard to test for. xk6-disruptor — an extension that adds fault injection capabilities to Grafana k6, the open source reliability and load testing tool — can help overcome these challenges.