Operations | Monitoring | ITSM | DevOps | Cloud

%term

Four Ways to Adapt ITSM to an Agile World

The transition to Agile development and continuous deployment has resulted in the DevOps movement to break down organizational walls. While there are many benefits to this approach, some best practices of traditional IT Service Management (ITSM) have been lost in the transition. Which ITSM processes and controls are still relevant and how can you adapt them to the new agile world?

Shipping Clean Code at Sentry with Linters, Travis CI, Percy, & More

Shipping clean, safe, and correct code is a high priority for engineering at Sentry. Bugs are best discovered before they hit production because afterward they have real user impact and can drain even a high-performing team’s resources quickly. The later in the development cycle a bug is found, the longer it will take to fix.

Building a more reliable infrastructure with new Stackdriver tools and partners

Every software organization faces challenges in keeping applications available and running reliably. At Google, we’ve developed and practiced a discipline known as Site Reliability Engineering (SRE). Following SRE practices lets us build and operate services reliably for our billions of users. Google has about 2,500 Site Reliability Engineers who support both internal and external services.

Another Journey of Chaos Engineering

Chaos engineering is here to stay. There's a thriving community, numerous open source projects, a few books, even a startup. Companies are hiring chaos engineers and creating entire teams focused on chaos engineering. This talk is about strategies for launching a chaos engineering movement at your company, as well as the challenges and results you can expect.

Accelerating Incident Response

Incidents are never fun, but a bad incident response process makes them even less so. How do technical teams mobilize the right people and provide the right context and tooling to rapidly take action and drive incident resolution? With the clock ticking and up to millions of dollars lost per minute of downtime, there’s no time to waste in assembling the right experts.