Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Chapter Ten: In Which Sarah Resigns from Animapanions and Heads Off to Start Up a Competitor

This is the tenth chapter in The Observability Odyssey, a book exploring the role that intelligent observability plays in the day-to-day life of smart teams. In this chapter, our DevOps Engineer, Sarah, throws in the towel at C&Js and moves on to build her own business.

Chapter Eleven: In Which James Speaks with the Industry Analysts

This is the eleventh chapter in The Observability Odyssey, a book exploring the role that intelligent observability plays in the day-to-day life of smart teams. In this chapter, our IT Ops Leader, James, speaks with the analysts about what’s happening in the AIOps space.

Getting Started with Site Reliability Engineering

Site Reliability Engineer (SRE) is one of the fastest growing jobs in tech, with Linkedin reporting 34% growth YoY in 2020 and over 9000 openings in their Emerging Jobs Report. If you’re new to SRE and exploring it as a career path, understand that it can be a challenging but rewarding experience. Here are some quick tips on how you can get started with SRE and jump-start a rewarding career.

Strategies to Strengthen Nurse Mental Health and Safety

No job is easy, but the job of a nurse is even more challenging, especially during a global health crisis. Nurses are at a higher risk of developing burnout due to the psychological trauma and cognitive overload that comes with the nursing profession. The situation is further exacerbated when nurses take on more responsibility during a pandemic or other large-scale incidents.

SLOs, SLIs, and where to find them with Jacob Plicque III

Identifying the right the right Service-Level Indicators is mission-critical for any SRE team responsible for meeting Service-Level Objectives and reporting on them. Find out how to sift through mountains of metrics and fill gaps in your data in order to visualize SLIs that actually matter for effective error budget tracking and actionable alerts in Grafana. Presented by: Jacob Plicque III, Senior Engineer at Grafana Labs at Grafana East Coast Virtual Meetup - August 2021

Real-time digital operations management puts connected vehicles on the road to success

As technology advances and applications for the Internet of Things (IoT) continue to expand, industrial and manufacturing companies are embedding more digital systems into their operations. From smart factories and intelligent shipping to automation and 3D printing, Industry 5.0 has arrived.

How to Avoid the Executive 'Swoop and Poop' and Other Best Practices for Operational Maturity

We’re eating at restaurants again. We’re seeing family after too long apart. Some of us may even be returning to the office. But, that doesn’t mean that the pressure is off for digital services, and growing in operational maturity still remains top of mind. While the digital transformations have been taking place for the last two decades, COVID-19 added pressure to speed initiatives.

Are You Spending Enough on Cybersecurity?

Cybercriminals do not discriminate against the organization, people or industry they target. These actors look to exploit vulnerabilities in resources to intercept valuable data from small and medium-sized businesses (SMBs). Cyberattacks are inevitable, and organizations must have the right controls and information security systems to mitigate the impact of an attack.