Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

AlertOps Partners With Cisco AppDynamics to Enhance Major Incident Resolution

Chicago, IL – May 17, 2022 AlertOps, a major incident response management platform, announced today a new technology integration partnership with Cisco AppDynamics, the leading Application Performance Monitoring (APM) and full-stack, business-centric observability solution. This new relationship empowers AlertOps and AppDynamics, joint users, with intelligent alerting, escalation policies, workflows, and scheduling to rapidly remediate major incidents.

A Chat with Lex Neva of SRE Weekly

Since 2015, Lex Neva has been publishing SRE Weekly. If you’re interested enough in reading about SRE to have found this post, you’re probably familiar with it. If not, there’s a lot of great articles to catch up on! Lex selects around 10 entries from across the internet for each issue, focusing on everything from SRE best practices to the socio- side of systems to major outages in the news. ‍ I had always figured Lex must be among the most well-read people in SRE, and likely #1.

Announcing our new Webex Meetings integration

Previously, FireHydrant supported video collaboration tool integrations for Zoom and Google Meet. In response to customer asks, today we are pleased to introduce our new Cisco Webex Meetings integration for all paid plans. With the new integration, teams can automate Webex bridge creation as part of incident response.

On-Prem? Cloud? Hybrid? What is my best option?

With the Cloud Bridge introduction, I started reaching out to our customer base to make sure people are aware of what this feature can do, Usually, I try to keep our customers up to date through blogs or the occasional webinar but with the Cloud Bridge, I went for a more personal approach. Reaching out to our customers individually, presented a unique opportunity to educate them on our Cloud Bridge and by extension what SIGNL4 can bring to the table.

How to empower your team to own incident response

Responding to and managing incidents feels fairly straightforward when you’re in a small team. As your team grows, it becomes harder to figure out the ownership of your services, especially during critical times. In those moments, you need everyone to know exactly what their role is in order to recover fast. Moving to incident.io as the 7th engineer, from a scaleup of around 70 engineers, has given me a new perspective on what it means to own your code.

What SREs Can Learn from the Atlassian Nightmare Outage of 2022

What happens when the tools and services you depend on to drive Site Reliability Engineering turn out to be susceptible to reliability failures of their own? That’s the question that teams at about 400 businesses have presumably had to ask themselves this month in the wake of a major outage in Atlassian Cloud.

Whiskey and Wisdom: Justifying AIOps

Whiskey and Wisdom is a monthly executive-only forum where IT Operations leaders can network independently and discuss high-level AI operations and IT Ops strategies with their industry peers. In our most recent session, the discussion was around justifying AIOps—proving the value the technology brings to the table.

Incident Commanders: where are they now?

BigPanda gives the Incident Commander award to IT Ops superstars—people who go above and beyond in this high-pressure, critical line of work. In 2021, Ben Narramore, Director of Operations/Service Management at PlayStation was a recipient for his ability to handle high-impact global incidents with exemplary professionalism and skill. Let’s find out what he’s been up to…

How to Make Your Incident Response Plan with Mattermost

For teams who deploy software to users around the world, every second counts when responding to outages and other incidents. It’s important that you have tools in your arsenal that are up to the challenge. Service monitoring, alerting, collaboration, and visibility are all essential components of a well-implemented incident response plan.