Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Minimize the impact of critical incidents with Freshservice On-Call Management

“Service outage! Help!” These words (or their variations), have preceded notable losses of millions and billions of dollars in the 21st century. From large corporations to SMBs, no one is immune to the effects of downtime – whether planned or unplanned. However, the earlier an issue is noticed, the faster it is acted upon and resolved, resulting in little or no customer impact.

Monitoring & Observability for Sales, Marketing and Business ops teams with StackMoxie and PagerDuty

Before Stack Moxie, every business ops team needed PagerDuty, but finding and pushing errors was a manual process. With Stack Moxie + PagerDuty, every business op professional can manage their sales, marketing, HR or customer success stack with the same quality engineers bring to code.

OnPage's Clinical Communication and Collaboration Solution

Modern healthcare teams require a modern solution to streamline clinical communications and medical workflows. In life and death situations, it’s critical that physicians receive immediate alerts and messages to provide patient care promptly. OnPage is the industry’s most trusted clinical communications platform. OnPage is more reliable and secure than traditional pagers. The system enables care teams to easily communicate and achieve maximum patient satisfaction.

4 IT Challenges Addressed by OnPage Automated Alerting

IT organizations are challenged with delivering quick, effective resolution to customers’ database, hardware or software downtime issues. Contractually binding service-level agreements (SLAs) place further pressure on IT engineers to accelerate incident resolution time and minimize downtime. Though engineers are obligated to meet their SLAs, they are unable to do so without the help of an automated alerting system.

Logs and tracing: not just for production, local development too

We're a small team of engineers right now, but each engineer has experience working at companies who invested heavily in observability. While we can't afford months of time dedicated to our tooling, we want to come as close as possible to what we know is good, while running as little as we can- ideally buying, not building. Even with these constraints, we've been surprised at just how good we've managed to get our setup.

Avoid frostbite: Stop doing code freezes

As the holiday season aggressively approaches I want to perform a public service announcement for everyone toying with the idea of a code freeze for the holidays: please don't. It’s getting cold outside and the season of peppermint mochas is upon us, which might get you thinking about putting a code freeze in place for the holidays. A Word of warning: instituting a code freeze may have unintended consequences.

How to improve your influence as an SRE

Improving your influence over the company will help you deliver high quality work as your goals will be closely aligned with those of the company. In this blog piece, Ricardo has explained how to improve your influence as an SRE. Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task.