Operations | Monitoring | ITSM | DevOps | Cloud

Squadcast

Transparency in Incident Response

When your production systems are hit with a critical issue, you can trust your DevOps team, your Sysadmins or your SREs to get the system back on track. This is a no brainer. And in turn, these folks need to be able to trust the rest of the team to let them do their jobs, be it engineering, customer support or product management. But where does this trust come from? It comes from understanding - the more you understand, the more you can trust.

Danny Mican on his experience as an SRE at Auth0

Danny is an SRE at Auth0 and currently manages the reliability of systems that authenticate over 2.5 billion logins per month and is expected to have 99.9% (Three Nines) availability. He loves learning about systems and making changes that positively impact client happiness, employee happiness and long term stability and growth.

The Age of Service Mesh

You have built a massively successful system. The users just can't get enough and request new features. Your developers crank out new services on a regular basis. Your DevOps/SRE team configures and scale your Kubernetes cluster (or clusters). As the system becomes more complicated and sophisticated you realize that there are common themes that repeat across all your services.

LISA19 - Lightning Talk by Squadcast : How to SRE without an SRE on Your Team

Squadcast is an incident management tool that’s purpose-built for SRE. Create a blameless culture by reducing the need for physical war rooms, centralize SLO dashboards, unify internal and external SLIs and automate incident resolution with Squadcast Actions and create a knowledge base to effectively handle incidents.

Pavlos Ratis shares his experience on being an SRE

Pavlos is a Site Reliability Engineer based in Munich, Germany. He likes building software and expanding his knowledge around the reliability of services and their infrastructure. He has created a few open-source SRE projects such as the awesome-sre, Wheel of Misfortune, Availability Calculator, and awesome-chaos-engineering to assist teams and individuals in getting on board with the SRE culture.

Managing technical risk effectively with Error Budgets

Tradeoffs are hard. Think about the time when you had to choose between two equally compelling options - (a) addressing technical debt or (b) pushing out that long-awaited feature release, and risk breaking production. Or when your team couldn’t agree on where to draw the line on improving request latency versus shipping a major new update.