Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Making a Dog's Job Easier With Mass Notifications

Time-sensitive situations require immediate action, ensuring that a goal or task is achieved within a restrictive deadline. This is the case for Search and Rescue Dogs of Colorado (SARDOC), a non-profit organization dedicated to finding missing persons in mountainous and wilderness regions.

SREs can help Transform Enterprise IT

Transformation of IT Ops When I think of the term IT Ops I immediately think of Enterprise IT and the traditional attributes that make up this function – many of which are in the middle of an industry-wide disruption – and its associated impact. At LinkedIn, when we first looked at business process support, shadow IT and non-accounted-for IT spend, about 10 years ago, it was a bit of a revelation to me how the landscape had already changed by then.

OnPage Mentioned in Gartner's Hype Cycle for ITSM 2019 Report

Gartner’s Hype Cycle for ITSM report highlights tools or technologies that improve IT operations. It’s a comprehensive, in-depth document, allowing support teams to gain insight into the latest innovations, industry trends and recommendations. The OnPage team is pleased to be included in the latest Hype Cycle for ITSM report, listing OnPage’s solution as a trusted, reliable source for IT service alerting (ITSA).

How Opsgenie achieved 99.999% uptime over the last 12 months

At Opsgenie, our highest priorities are uptime and performance; our product’s very purpose is to enable our customers to keep their always-on services on – always. The Opsgenie team has achieved 99.999% uptime over the last 12 months, during which we enhanced our platform with new features and integrations and joined the Atlassian family.

Keep Your Business Stakeholders Updated While You Save the Day

Imagine this: An airline encounters a major IT incident in a data center that affects their ticketing system. Behind the scenes, technical responders are scrambling to diagnose and fix the issue. However, because today’s systems are so complex, this issue is taking longer than expected to resolve, and hours have passed since the system went down. Meanwhile, passengers are stranded and taking their anger out on customer service agents and sharing their frustrations on social media.

Reducing MTTR in the Field: 10 Simple Steps Using Retrace

The last decade has ushered in a golden era of software engineering. The rise of cloud computing freed companies from managing their own data centers and provided on-demand scaling. These services allow for provisioning servers on the fly using configuration and code. Treating that task as just another type of software development led to the advent of DevOps.

6 Best Practices For Outstanding Critical Incident Management

"Businesses need to face the inevitability of being hacked at some point. It's not a question of if, but when — and that's why being proactive to minimize the risk is essential." Robert Egan. When a critical incident hits, what happens to an organization without an efficient incident management plan? Essentially, all stakeholders are left "fighting fires," trying to recover their systems, and get their business back up and running.

Intent-based Capacity Planning and Autoscaling with Kubernetes

Intent-based Capacity Planning is Google's approach to declare reliability intent for a service and then solve for the most efficient resource allocation plan dynamically. Learn how you can start using this approach to effectively manage the reliability of your services running on your Kubernetes cluster.

Four Healthcare Workflows for Better Clinical Communications

Healthcare organizations strive to enhance patient experience, ensuring that patients receive proper treatment at the right time, every time. However, due to antiquated communication tools, such as the pager, this goal is often difficult to achieve for some healthcare providers. Today’s healthcare facilities require an advanced pager replacement solution, integrating with intelligent scheduling systems and EMR solutions for better patient outcomes.

Calculating MTTR: An Evolution Driven by the Rise of DevOps

The shift to cloud computing and the DevOps revolution have fueled some important changes in the way we think about software development and monitoring. It has delivered huge benefits to the companies that have fully embraced the approach. In fact, the DevOps Research and Assessment (DORA) 2018 industry survey found a new small group of “elite” performers that are deploying code far more often and having a far better mean time to resolution (MTTR) than the next closest group.