Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Getting started with Packet Loss attacks

Imagine this: you're in the middle of an important presentation when all of a sudden your video feed starts to stutter. You hear other people speaking, but their words are choppy. A message comes through Slack from one of your co-workers: "I think your connection cut out." You scramble to try different solutions—restarting your videoconferencing application, checking your Internet connection, switching to your phone—but ultimately, your presentation gets cut short.

The Dual Approach in Scaling: Chaos Engineering and Performance Engineering

For any enterprise, they're more than likely all too familiar with the struggles and complexities of scaling their environments and applications. Whether these applications live on premise, in a cloud environment, or somewhere between in a hybrid state, an age-old question engineering ponders on is, “Can my application and environment scale?

Podcast: Break Things on Purpose | Alex Solomon & Kolton Andrus: Break it to the Limit

Time for a cross over! Today Page it to the Limit host Mandi Walls, DevOps Advocate at PagerDuty joins Julie for a special episode. In this two part episode, Julie and Mandi interview Kolton Andrus, co-founder of Gremlin and Alex Solomon, co-founder of PagerDuty. Each of them share the origins of their respective companies, how they build amazing cultures, and some of the fun anecdotes along the way.

Getting started with Latency attacks

As the world becomes more dependent on cloud-native systems, the tolerance for slow services is decreasing. Users expect instantaneous access to services, whether it's for work, entertainment, or even cloud infrastructure. Even small amounts of latency can significantly decrease user satisfaction: nearly half of all users expect web pages to load in under two seconds, and as many as 28% of users will permanently abandon a slow site.

Why Reliability Engineering Matters: an Analysis of Amazon's Dec 2021 US-East-1 Region Outage

In the field of Chaos Theory, there’s a concept called the Synchronization of Chaos—disparate systems filled with randomness will influence the disorder in other systems when coupled together. From a theoretical perspective, these influences can be surprising. It’s difficult to understand exactly how a butterfly flapping its wings could lead to a devastating tornado. But we often see the influences of seemingly unconnected systems play out in real life.

Podcast: Break Things on Purpose | Carissa Morrow: Learning to be Resilient

Being new in tech an be intimidating! Thankfully, folks like Carissa Morrow are shining examples of how to come into tech from the ground up. Carissa began with a career shift and just started coding, went through the Boise Codeworks bootcamp, and made the jump to tech. Carissa talks about the resilience it took in her early days, and how those experiences reinforced her attitude on continually learning.

Podcast: Break Things on Purpose | Gunnar Grosch: From user to hero to advocate

Reliability and serverless are at the forefront of today’s conversation. For this episode Gunnar Grosch, Senior Developer Advocate at AWS, is here to talk about Chaos Engineering, AWS Serverless, and the work that AWS is doing when it comes to reliability.

If you're adopting Kubernetes, you need Chaos Engineering

When Ticketmaster started their Kubernetes migration, they had to address a huge problem: whenever ticket sales opened for a popular event, as many as 150 million visitors flooded their website, effectively causing distributed denial of service (DDoS) attacks. With new events happening every 20 minutes and $7.6 billion in revenue at stake, outages could mean hundreds of thousands in lost sales.

Getting started with Time Travel attacks

It's the middle of the night when your phone goes off. You rub your eyes and unlock the screen to see a SEV 1 alert from your incident management tool. The application is down, multiple cloud server instances are offline, and the remaining instances are being overwhelmed by the sudden increase in demand. You jump out of bed and start trying to troubleshoot. You log into your cloud provider and try to provision systems manually, only to find out you can't.