Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Impressions from Gartner IOCS 2023

Gartner’s IT Infrastructure, Operations & Cloud Strategies Conference (IOCS) is an annual event that attracts ITOps, SRE, and DevOps leaders from around the world. As Gartner explains, IOCS “brings the world’s technology leaders together to hear top trends, find objective answers, and explore topic coverage in addition to best practices. Gain the insights and guidance to create an effective pathway to the future and network with your peers.”

Why monitoring your application is important

Effective monitoring and observability tools are critical for modern enterprises. Daily operations, digital transformation, moving to a cloud-native architecture, and an ever-evolving tech stack all require ITOps, DevOps, and SRE teams to monitor increasingly complex systems. So what happens if your applications suddenly cease to function? Every moment of downtime translates to lost income, decreased customer satisfaction, and harm to your company’s reputation.

APAC Retrospective: Learnings from a Year of Tech Turbulence

Throughout 2023, one thing has become abundantly clear: regardless of an organization’s size or industry, incidents are inevitable. Recently across the APAC region, we’ve seen numerous regulatory bodies clamp down on large companies who are failing to provide acceptable service, with some handing out quite severe penalties. For many, the cost of an incident is no longer just lost revenue and customer trust, but financial penalties and business restrictions.

Understanding ServiceNow Incident Management: A comprehensive guide

You’re focused on swiftly identifying, analyzing, and resolving disruptions in IT services. And you know all too well that correctly deploying and adopting incident management holds the key to delivering a more reliable and responsive IT environment for your applications and services. That’s why you’re using or are considering using ServiceNow’s incident management to ensure a structured and efficient approach to handling your IT service incidents.

Better Incidents Winter Bonfire: Inside On-Call

Engineers are bombarded with pages left and right. There's uncertainty about how to escalate. A constant blur exists between what's urgent and what can wait. This never-ending ping-pong game takes a toll. Burnout creeps in, and your engineering culture has taken a nose dive before you know it.

BookMyShow's Cinematic Product Journey - Incidentally Reliable Podcast with Viraj Patel

Grab some popcorn and catch Viraj talk about his experiences and BookMyShow's journey from its inception in the early 2000s to the entertainment behemoth it is today, their stints innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability engineering in the colourful, ever-changing yet challenging world of movies and online ticketing. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.

LLM Monitoring and Observability

Large Language Models (LLMs) are advanced artificial intelligence models designed to comprehend and generate human-like language. With millions or even billions of [parameters, these models, like GPT-3, excel in natural language processing, understanding context, and generating coherent and contextually relevant text across various applications.