Latest News

Attaching incident playbooks to Azure monitor alerts for rapid remediation

Aug 10, 2020 By Vishwa Krishnakumar In Zenduty

Incident response playbooks are a set of actions that need to be executed by your incident repsonders depending on the nature of the outage. Having well defined incident response playbooks can be extremely critical, especially during high customer impact events, that you would typically classify as Sev-0 incidents.

Read Post

Zenduty

Read more about Attaching incident playbooks to Azure monitor alerts for rapid remediation

Make Informed Care Decisions With an EHR and Communication Tool Integration

Aug 7, 2020 By Ritika Bramhe In OnPage

Electronic health records (EHR) are real-time patient health record systems made to securely share patient information with authorized users. Users include those in medical labs, imaging facilities, pharmacies and emergency departments. Essentially, EHRs provide medical information to everyone involved in the patient-care continuum. OnPage continuously explores new ways to expand its value and enhance business processes and workflows to clients.

Read Post

OnPage

Read more about Make Informed Care Decisions With an EHR and Communication Tool Integration

The Importance of Reliability Engineering

Aug 6, 2020 By Emily Arnott In Blameless

If you’ve spent any time in tech circles lately, there are three letters you’ve surely heard: SRE. Site Reliability Engineering is the defining movement in tech today. Giants like Google and Amazon market their ability to provide reliable service and startups are now investing in reliability as an early priority. But what makes reliability engineering so important?

Read Post

Blameless

Read more about The Importance of Reliability Engineering

Improving Postmortems from Chores to Masterclass with Paul Osman

Aug 5, 2020 By Blameless Community In Blameless

In our 2019 Blameless Summit, Paul Osman spoke about how to take postmortems or incident retrospectives to a new level. ‍The following transcript has been lightly edited for clarity. Slides from this talk are available here. Paul Osman: I lead the SRE team at Under Armour. Who here knows about Under Armour as a tech company? Does anybody think about Under Armour as a tech company? Under Armour makes athletic attire, shirts and shoes.

Read Post

Blameless

Read more about Improving Postmortems from Chores to Masterclass with Paul Osman

Nishant Singh shares his thoughts on being an SRE

Aug 5, 2020 By Squadcast In Squadcast

Nishant Singh is an SRE at LinkedIn based in Bangalore. Currently, he is working towards building and maintaining applications that improve the overall MTTD (Mean time to detect) and MTTR (Mean time to recover) of the site. He likes to build services and play with the latest technologies. Before LinkedIn, Nishant worked for a few companies in the security and e-commerce domain as a DevOps engineer where he was primarily responsible for building infrastructure, deployment pipelines and security.

Read Post

Squadcast

Read more about Nishant Singh shares his thoughts on being an SRE

Network Operations Center Best Practices (in 2020)

Aug 5, 2020 By AlertOps In AlertOps

Your Network Operations Center (NOC) is responsible for network monitoring, incident response, and other network operations activities — and you want to optimize its performance. To achieve your goal, your NOC team assesses data and explores ways to improve its everyday operations. The team may also implement NOC best practices or craft some of its own. NOC teams manage network availability and performance, along with servers, databases, firewalls, devices, and related external services.

Read Post

AlertOps

Read more about Network Operations Center Best Practices (in 2020)

Top Five Reasons Why Companies Are Choosing OnPage Over Competitors

Aug 5, 2020 By Christopher Gonzalez In OnPage

OnPage’s intelligent incident management system is the alerting solution of choice for industry-leading organizations. Since the beginning, companies have invested in the OnPage system for its advanced capabilities, out-of-the-box integrations and unmatched 24/7 customer support. Though we can provide a comprehensive view into OnPage’s competitive advantage, here are the top five reasons why customers continue to trust OnPage’s incident management system.

Read Post

OnPage

Read more about Top Five Reasons Why Companies Are Choosing OnPage Over Competitors

Telemetry Everywhere: Observability in the DevOps Cosmos

Aug 4, 2020 By Juan Perez In Moogsoft

Rockets constantly blast off into space headed towards planets, aiming to create shiny new stars, while meteors whizz by them, threatening their journeys. That’s how global DevOps expert Helen Beal describes the complicated and risky universe of DevOps practitioners and SRE teams. The rockets are these teams’ frequent code releases. Planets represent customers that benefit from the value — stars — created by these launches.

Read Post

Moogsoft

Read more about Telemetry Everywhere: Observability in the DevOps Cosmos

August 2020 Update: Manage service and system categories in the web portal and define responsibilities centrally

Aug 4, 2020 By René In SIGNL4

Our August update now makes it easy to assign team responsibilities for individual systems through our categories. This is no longer only possible by each team member in the mobile app, but can now also be done centrally in the web portal by the team administrator. All details can be found in this blog article.

Read Post

SIGNL4

Read more about August 2020 Update: Manage service and system categories in the web portal and define responsibilities centrally

How to Bring Operational Experience to your Development with Github's Lauren Rubin

Aug 4, 2020 By Blameless Community In Blameless

At the 2019 Blameless Summit, Lauren Rubin spoke about how to bring operational expertise to development teams. The following transcript has been lightly edited for clarity. Lauren Ruben: I was going to ask for a show of hands of how many people here who are on call right this minute right now. I am actually on call right this minute. I like to live dangerously. If my phone beeps, the specific noise that means I have been paged, I'm sorry, I am going to look at it.

Read Post

Blameless

Read more about How to Bring Operational Experience to your Development with Github's Lauren Rubin

Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Attaching incident playbooks to Azure monitor alerts for rapid remediation

Make Informed Care Decisions With an EHR and Communication Tool Integration

The Importance of Reliability Engineering

Improving Postmortems from Chores to Masterclass with Paul Osman

Nishant Singh shares his thoughts on being an SRE

Network Operations Center Best Practices (in 2020)

Top Five Reasons Why Companies Are Choosing OnPage Over Competitors

Telemetry Everywhere: Observability in the DevOps Cosmos

August 2020 Update: Manage service and system categories in the web portal and define responsibilities centrally

How to Bring Operational Experience to your Development with Github's Lauren Rubin

Monthly Archive

Follow Us