Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

A Day in the Life: Intelligent Observability at Work with our SRE, Dinesh

When I asked Charlie for permission to attend this year’s AICon (virtual, natch) I thought it would be a shoo-in; learning’s part of my OKRs after all. But he never makes things easy and his ‘yes’ came with a caveat that’s typical when dealing with him. This time, he claimed he didn’t have the budget for the ticket (a likely story!) and I’d have to find another way to get one.

WTF is Incident Management? Post-Panel Wrap-Up

That's a wrap! We hosted "WTF is Incident Management" on May 12, 2021. We invited four very knowledgeable panelists to discuss how they define incident management, what changes they'd make if they could start again from scratch, how to manage team stress after an incident, and other subjects. Our panelists were: host Matt Stratton (Staff Developer Advocate at Pulumi), Emily Ruppe (Incident Commander at Twilio), Alina Anderson (Sr.

Enterprise Alert Alarm Center. A NOC's best friend.

Over time, Enterprise Alert continues to grow and more and more teams are starting to benefit from Enterprise Alert’s reliable alerting. As part of this process, Enterprise Alert almost always becomes a central component of the NOC and has practically trained the NOC admins. For this reason, here in support we rarely have the pleasure of presenting the features of our alarm center.

New Event Source - Website Monitoring

Enterprise Alert is constantly evolving to provide our customers with new ways to implement event sources and use new features. With version 9, several new features have been implemented that make it easier for customers to create alerts for specific processes and events. These include the new “Website Monitoring” event source.

Understanding a Microsoft Service Outage

Maintaining business continuity when an issue arises has proven to be a challenge many organizations struggle with. A global pandemic being thrown into the mix in Q1 of 2020 (one that many businesses are still navigating through) introduced a new set of problems for both service providers and businesses reliant on those services.

Enhance NOC Alerts With Incident Management and Alert Automation

In a network operations center (NOC), alerts originating from hundreds of servers, application monitoring systems, emails and ticketing services compete to catch a NOC analyst’s attention. NOCs face many challenges in parsing through alerts to identify actionable notifications and mobilize the right response team into action.

SRE Leaders Panel: Business Agility is what matters, SRE can help you get there

Blameless recently had the privilege of hosting SRE leaders Garima Bajpai, Founder at Community of Practice - DevOps Canada and Jason Fraser, Delivery Lead at VMware Tanzu to discuss the value of crisis during incident response, the best and worst tech transformations they’ve seen, how reliability impacts the flow of value, and more.

Concrete Steps to Reducing MTTR

In today’s data-centric world, metrics or numbers define all performance benchmarks. The time between when an event starts and ends shows how well a system can handle and process such events. One of such metrics is MTTR. MTTR usually stands for Mean Time To Resolution, but it has held several meanings over the years. MTTR is a metric used to measure how well a system can bounce back from errors and provide long-lasting solutions.