The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
People working in IT support and incident management right now are faced with unusual difficulties supporting large remote workforces and managing unpredictable workloads. On Reddit, system admins and other IT pros are bemoaning the hiccups and hassles of working in isolation while trying to resolve issues and maintain high SLAs. You can’t go grab your indispensable SME for troubleshooting, because that person is also home and inundated with messages and alerts from many different tools.
Google Cloud Platform (GCP) is a collection of Google’s computing resources, made available via services to the general public as a public cloud offering. The GCP resources consist of physical hardware infrastructure — computers, hard disk drives, solid-state drives, and networking — contained within Google’s globally distributed data centers, where any of the components are custom designed using patterns similar to those available in the Open Compute Project.
As engineering teams shift from delivering services on monolithic architectures to microservices and even serverless environments, developers are no longer just responsible for creating and maintaining their code. Shared ownership has become the new normal (or at least trending towards) and so they are now responding to production incidents and in some cases in the on-call rotation. Of course incidents vary in terms of impact, but they do take time away from innovation and creating new capabilities.