Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Resilience in Action, E5: Tammy Bryant and Eric Roberts The Importance of Glue Work

Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Blameless Staff SRE Amy Tobey. Amy has been an SRE and DevOps practitioner since before those names existed. She cares deeply about her community of SREs and wants to take what she’s learned over the 20+ years of her career to help others.

Humanizing a DevOps Transformation

Anyone who’s ever played the game of chess knows there’s more than one way to reach a desired outcome. There are 400 possible setups after the first turn; 197,742 after the second; and just north of 120 million after the third—all of which are marching toward the same desired outcome. “So, what does any of this have to do with DevOps?” you ask? Fair question.

Effective Communication Between Healthcare Professionals - Best Practices

Effective communication between healthcare professionals is critical for timely and effective operations. In a modern healthcare environment, communication technologies are critical for connecting healthcare professionals with other caretakers and healthcare entities, ensuring the best, most effective, immediate care to patients.

Choosing the Right SRE Tools

Implementing SRE practices and culture can be challenging. Fortunately, there are a variety of tools for each aspect of SRE: monitoring, SLOs and error budgeting, incident management, incident retrospectives, alerting, chaos engineering, and more. In this blog, we’ll talk about what to look for in an SRE tool, and how they’ll help you on your journey to reliability excellence.

Keeping your teams and customers in the loop during downtime

Making your organization more transparent is not always an easy process. In our latest blog post, Adam Hammond, shares some tips and tools that can help you get started when it comes to keeping your teams and customers in the loop during downtime.The core message is that you need to make communication a cultural pillar of your organization.

ChaosSearch Announces New Integration With Opsgenie

ChaosSearch is excited to announce its new integration with Opsgenie — Atlassian’s alerting and incident management platform. Using this integration, your teams can leverage the industry’s most powerful and comprehensive data monitoring and analytics capabilities channeled into a unified workflow through Opsgenie’s easy-to-use interface.

Look Upstream to Solve your Team's Reliability Issues

In “Upstream” by Dan Health, we explore a variety of different problems ranging from homelessness, to high school graduation rates, to the state of sidewalks in different neighborhoods within the same city. In each of these examples, Dan discusses how upstream thinking decreased downstream work. Upstream thinking is characterized as proactive, collective actions to improve outcomes rather than reactions after an issue has already occurred.

Incident Management with Datadog

When your application experiences an outage, the tools your team uses to manage its response can make all the difference in how quickly they resolve the problem and avoid it in the future. An effective incident management workflow depends on accessible, integrated tools as well as clear, direct channels of communication. And, even after the matter’s been resolved, documentation and analysis of an outage is vital to ensuring it never happens again.

Performing Zabbix Alert Correlation and Incident Acceleration with CloudFabrix AIOps

CloudFabrix AIOps 360 solution can ingest alerts, events, metrics and from various monitoring tools to perform event correlation, alert noise reduction and enable incident resolution acceleration. Learn more about CloudFabrix AIOps 360 In this blog I will cover Zabbix integration aspects with our AIOps 360 solution. Zabbix is one of the popular open source monitoring platforms used by many enterprises and MSPs, including some of our customers.