Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Single Sign-On Now Available on OnPage Enterprise-Level Accounts

Single sign-on (SSO) services provide a unified view into applications, logins and devices through a secure identity cloud. SSO allows users to access SaaS-based applications through one simple login process. We, at OnPage, are excited to announce that we’ve extended our integration catalog to include SSO services like Okta and OneLogin. Through a single sign-on process, OnPage enterprise-level users can access the OnPage dashboard from their Okta and OneLogin accounts.

New Integration: Declare FireHydrant Incidents from Checkly Alerts

Streamlining your incident management process is what we do best, and one of the ways we do that is by acting as the connective tissue across all of your applications. We’ve partnered with Checkly to bring you a new integration that empowers you to detect problems and resolve incidents faster.

Use Datadog's Notebooks API to programmatically manage your notebooks

Datadog Notebooks simplify the way teams across an organization find and share knowledge. By bringing together live data and rich Markdown text, Notebooks help teams create powerful, data-driven documents—from runbooks and support playbooks to incident postmortems and data reports. And with collaboration functionalities like real-time editing and commenting, team members can simultaneously make changes to a document and gather feedback along the way.

If everyone is AIOps - which AIOps is right for you?

With so many IT vendors claiming they provide AIOps platforms, how do you understand the differences between them, and decide what flavor of AIOPs to choose for your organization? Join us in a CTO Perspective discussion with Elik Eizenberg, CTO and co-founder at BigPanda, to find the answer. Read the skinny for a brief summary, then either lean back and watch the interview, or if you prefer to continue reading, take a few minutes to read the transcript. Enjoy!

SRE Availability Metrics

How available is your website, service, or platform? What must you monitor and measure to ensure availability? How do you translate uptime into availability? This chart has numbers that every Site Reliability Engineer (SRE) should know. Below the chart, you will find answers to commonly asked questions about SRE and associated metrics.

A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

When I asked Charlie for permission to attend this year’s AICon (virtual, natch) I thought it would be a shoo-in; learning’s part of my OKRs after all. But he never makes things easy and his ‘yes’ came with a caveat that’s typical when dealing with him. This time, he claimed he didn’t have the budget for the ticket (a likely story!) and I’d have to find another way to get one.