Operations | Monitoring | ITSM | DevOps | Cloud

DevOps

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Updatable Ubuntu Server Live Installer

The Ubuntu Server Live Installer, introduced with the release of Ubuntu 18.04 LTS (Bionic Beaver), provides a live Ubuntu Server environment along with a streamlined server installation experience. Building on guided installs for LVM, RAID, encrypted disks and advanced networking configuration (VLANs and bonds) the installer can refresh itself to the latest version during the live session.

On-Demand Webinar - Modernizing Monolithic Apps with AWS & Stackery

A few weeks ago, Stackery had the pleasure of participating in a webinar with leaders from AWS and MasterStream ERP, a telecom-quoting company that has quite the architectural modernization story to tell due to their adoption of serverless with Stackery. Our very own Farrah Campbell (Ecosystems Director) sat down for a fireside chat with Santiago Cardenas (Sr.

Monitor ProxySQL with Datadog

ProxySQL is a MySQL/MariaDB protocol–compliant load balancer and reverse proxy with native support for a range of popular backends including ClickHouse, Amazon Aurora, and Amazon RDS. ProxySQL efficiently distributes queries to your database servers and caches results, improving resource management and boosting database performance. You can also configure ProxySQL for high availability to reduce downtime.

Performing chaos in a serverless world  Gunnar Grosch  Failover Conf 2020

Chaos engineering is the practice of hypothesis testing through planned experiments to gain a better understanding of a system’s behavior. The principles of chaos engineering have been around for years, and we have now reached the point where chaos engineering has gone from just being a buzzword and practice used by a few large organizations in very specific fields, to it being put in to use by companies of all sizes and industries.

Swim Don't Sink: Why Training Matters to a Site Reliability Engineering Practice  Jennifer Petoff

Do you offer training to the engineers in your organization or do you throw them off the deep end to “sink or swim”? Providing training and education is universally important to set team members up for success in your organization and is critical for establishing a thriving Site Reliability Engineering (SRE) or DevOps practice and culture in the first place.

Fight, Flight, or Freeze - Releasing Organizational Trauma Matt Stratton Failover Conf 2020

When humans are faced with a traumatic experience, our brains kick in with survival mechanisms. These mechanisms are the familiar fight or flight response, but can also include the freeze response - which occurs when we are terrified or feel that there is no chance of escape.

Y2K and Other Disappointing Disasters: Risk Reduction and Harm Mitigation  Heidi Waterhouse

Every disaster is a concatenation of smaller failures. How can we design software and processes to accept that we live in an imperfect world? Explore the concepts of resiliency, harm reduction, over-engineering, and planning for failure with real examples.

How to fail with Serverless  Jeremy Daly Failover Conf 2020

Everything fails all the time. Knowing how to deal with these failures in serverless applications becomes essential to building resilient, highly-available systems. In traditional monolithic applications, catching errors and handling retries is relatively straightforward. But as our systems become more distributed, we now have multiple (often asynchronous) components processing events from several sources, all with vastly different retry behaviors and failure mechanisms. Utilizing old patterns can cause errors to get swallowed, creating brittle, unreliable systems that are difficult to debug and hard to maintain.

Slowdown is the New Outage  Marco Coulter  Failover Conf 2020

While outage-driven news headlines can cause stock prices to plummet short term, the performance-driven reputation loss is a slow burn for longer-term customer loss. This session compares slowdowns vs outages and the resulting need for insight more than observability. By understanding these difference, you'll be ready to drive agile applications, gain funding for lowering technical debt, and focus on customer retention.

The Halo of Resilience Engineering  J. Paul Reed  Failover Conf 2020

Recent world-impacting events have caused us all to have to rethink the way we go about our daily work; in this talk, we'll look at how some of the pillars of Resilience Engineering might help you and your team deal with the changes we're all being forced to confront.