Latest News

Interpreting your reliability test results

Sep 19, 2024 By Andre Newman In Gremlin

Gremlin’s default suite of reliability tests analyzes critical functions of modern services: scalability, redundancy, and resilience to dependency failures. Services that pass this suite of tests can be trusted to remain available during unexpected incidents. But what happens when a service fails a test? How do you take failed test results and turn them into actionable insights? This blog aims to answer that question.

Read Post

Gremlin

Read more about Interpreting your reliability test results

What's Chaos Monkey? Its Role in Modern Testing

Sep 17, 2024 By Muhammad Raza In Splunk

Chaos Monkey is an open-source tool. Its primary use is to check system reliability against random instance failures. Chaos Monkey follows the testing concept of chaos engineering, which prepares networked systems for resilience against random and unpredictable chaotic conditions. Let’s take a deeper look.

Read Post

Splunk

Read more about What's Chaos Monkey? Its Role in Modern Testing

Release Roundup August 2024

Sep 9, 2024 By Andre Newman In Gremlin

Over the past year, the Gremlin team has focused on giving you more tools to adapt Gremlin to your organization’s reliability needs. We started with customizable reliability tests, and now, we’ve released customizable role-based access controls (RBAC). We’ve also made it easier to target specific availability zones when running Failure Flags experiments, and to run experiments behind a proxy. Keep reading to learn more! ‍

Read Post

Gremlin

Read more about Release Roundup August 2024

Reliability recommendations when adopting Kubernetes

Sep 3, 2024 By Andre Newman In Gremlin

Kubernetes just celebrated its tenth birthday. That’s 10 years of microservices, containers, service meshes, and many other paradigms that are now common to many developers’ toolkits.

Read Post

Gremlin

Read more about Reliability recommendations when adopting Kubernetes

How to verify, document, and prove compliance with Gremlin

Aug 29, 2024 By Gavin Cahill In Gremlin

Resilient and reliable IT systems have become a minimum requirement for modern businesses—a fact driven home by any number of high-profile outages over the past few years. Unfortunately, when those outages are in the financial sector, it can have far-reaching and incredibly damaging results.

Read Post

Gremlin

Read more about How to verify, document, and prove compliance with Gremlin

How to test AWS managed services with Gremlin

Aug 1, 2024 By Andre Newman In Gremlin

Note In this blog, we use “managed service providers” to refer to companies that provide hosted computing services, not managed IT service providers (MSPs). ‍ When was the last time you thought about the reliability of your cloud dependencies? The biggest challenge with using cloud platforms and SaaS services is also its biggest strength: the provider controls everything.

Read Post

Gremlin

Read more about How to test AWS managed services with Gremlin

How role-based access control (RBAC) works in Gremlin

Jul 25, 2024 By Andre Newman In Gremlin

Reliability testing and Chaos Engineering are essential for finding reliability risks and improving the resiliency of systems. Gremlin makes it easy to do so, but not every engineer needs access to the same experiments, systems, or services. That’s why we released customizable role-based access controls (RBAC), letting Gremlin customers control which actions your users can perform in Gremlin.

Read Post

Gremlin

Read more about How role-based access control (RBAC) works in Gremlin

Destroy on Friday: The Big Day A Chaos Engineering Experiment - Part 2

Jul 23, 2024 By Lex Neva In Honeycomb

In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen. This is part two, where I go over the big day. How did our chaos engineering experiment go? Find out below!

Read Post

Honeycomb

Read more about Destroy on Friday: The Big Day A Chaos Engineering Experiment - Part 2

Chaos Testing Explained

Jul 19, 2024 By Shanika Wickramasinghe In Splunk

Chaos testing is a part of site reliability engineering (SRE). In chaos testing, we intentionally break things in and around a given application, in order to: The purpose of chaos testing is to assess how software systems respond to scenarios like network outages, hardware failures, database failures, and server or cluster node failures in the infrastructure.

Read Post

Splunk

Read more about Chaos Testing Explained

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1

Jul 16, 2024 By Lex Neva In Honeycomb

We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in our production environment using AWS’s Fault Injection Service. You might be wondering why the heck we did something so drastic. In this post, we’ll go over why we did it and how we made sure that it wouldn’t impact our service.

Read Post

Honeycomb

Read more about Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Interpreting your reliability test results

What's Chaos Monkey? Its Role in Modern Testing

Release Roundup August 2024

Reliability recommendations when adopting Kubernetes

How to verify, document, and prove compliance with Gremlin

How to test AWS managed services with Gremlin

How role-based access control (RBAC) works in Gremlin

Destroy on Friday: The Big Day A Chaos Engineering Experiment - Part 2

Chaos Testing Explained

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1

Monthly Archive

Follow Us