Chaos Engineering

How to Show Reliability Results to Your Organization

Jun 1, 2023 By Gavin Cahill In Gremlin

Building momentum for a reliability program can be tough. Improving reliability takes time, effort, and resources. But when everything from launching new features to improving security demands those same resources, it can be a struggle to get the buy-in you need to address reliability risks. And it makes sense! If a team spends time patching a known security bug or creating a new feature, they have a clear demonstration of the value created.

Read Post

Gremlin

Read more about How to Show Reliability Results to Your Organization

Chaos Engineering 2023 with Chaos Mesh

May 15, 2023 By Saiyam Pathak In Civo

We've seen a tremendous transition in the architecture of our systems over the years, from basic, linear systems to increasingly sophisticated, non-linear systems. We've moved away from monolithic programs, where a single person could comprehend the entire operation of a system, and toward a distributed world dominated by a microservices design.

Read Post

Civo

Read more about Chaos Engineering 2023 with Chaos Mesh

Chaos Engineering 2023 with Chaos Mesh - Saiyam Pathak | KubeCon + CloudNativeCon Europe 2023

May 13, 2023 By Civo In Civo

In this video, Saiyan Pathak discusses the importance of chaos engineering in building resilient systems, with a special focus on the Chaos Mesh project. As systems transition from monolithic to distributed and cloud-native architectures, traditional testing methods fall short. Chaos engineering fills this gap by facilitating real-world failure experiments, thereby ensuring system reliability. Read our blog on Chaos Engineering 2023 with Chaos Mesh -►

View Video

Civo

Read more about Chaos Engineering 2023 with Chaos Mesh - Saiyam Pathak | KubeCon + CloudNativeCon Europe 2023

Don't Just React to Incidents-Prevent Them

May 9, 2023 By Gavin Cahill In Gremlin

Incident response has been the cornerstone of reliability for decades. From digging in the server logs to navigating modern observability dashboards, responding quickly to incidents and outages is a big part of minimizing downtime. And it should be! When something breaks, your team should move as quickly as possible to address and repair the problem.

Read Post

Gremlin

Read more about Don't Just React to Incidents-Prevent Them

Chaos Engineering Tools: Myth vs Fact

Apr 4, 2023 By Gavin Cahill In Gremlin

With so many Chaos Engineering tools available, it’s no surprise that SRE and platform leaders are doing their homework when choosing a platform to help them build and scale their Chaos Engineering programs. But like anything else you can research on the internet, there’s a lot of noise and hype that you need to wade through. Gremlin works with Reliability Engineering teams at hundreds of companies with the most sensitive workloads—and has since 2016.

Read Post

Gremlin

Read more about Chaos Engineering Tools: Myth vs Fact

What is Gremlin?

Mar 28, 2023 By Gremlin In Gremlin

Today’s technology leaders are facing a reliability gap. Customers expect their apps to be fast and available. But with Devops and distributed systems driving more speed and complexity, it’s harder than ever to find and fix the reliability risks that can impact customer experience–before it’s too late. To close the Reliability gap, we need a reliability strategy. One that’s proactive, measurable, built-in and automated. We need a reliability management platform.

View Video

Gremlin

Read more about What is Gremlin?

Five Trends from SREcon Americas 2023

Mar 27, 2023 By Gavin Cahill In Gremlin

Last week, over five hundred SREs gathered in Santa Clara to share the latest research, tips, tricks, best practices, and more for site reliability engineering. They were joined by some of the biggest names in the reliability space. And, yes, Gremlin was there to answer any and all questions about chaos engineering and proactive reliability. After three days of great conversations and insightful talk, let’s take a look at some of the themes we heard weaving through SRECon.

Read Post

Gremlin

Read more about Five Trends from SREcon Americas 2023

How Gremlin helps you meet Google's Infrastructure Reliability standards

Feb 8, 2023 By Andre Newman In Gremlin

In January of 2023, Google released its infrastructure reliability guide, which provides guidelines on how to build high-availability applications in Google Cloud. While it's written for Google Cloud, it provides some excellent general-purpose information on how to architect reliable applications on any cloud provider, including: In this blog, we'll explain each of these factors and how you can use Gremlin to ensure you're meeting your reliability requirements.

Read Post

Gremlin

Read more about How Gremlin helps you meet Google's Infrastructure Reliability standards

Testing doesn't stop at staging

Feb 6, 2023 By Andre Newman In Gremlin

Imagine a perfect world where software releases ship bug-free. Developers write perfect code the first time, all tests pass without issues, operations teams effortlessly deploy builds to production, and customers never experience defects. Everyone's happy, and the Engineering team can focus exclusively on building and delivering features. Of course, we don't live in a perfect world.

Read Post

Gremlin

Read more about Testing doesn't stop at staging

The KPIs of improved reliability

Jan 31, 2023 By Andre Newman In Gremlin

For many businesses, prioritizing reliability is an ongoing challenge. Building reliable systems and services is critical for growing revenue and customer trust, but other initiatives—like building new products and features—often take precedence since they provide a clearer and more immediate return. That's not to say reliability doesn't have clear value, but proving this value to business leaders can be tricky.

Read Post