Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Protect Your Alerts: Why Incident Alert Management Shouldn't Share a Cloud

When managing IT infrastructure, one crucial aspect is ensuring that your incident alert management system remains operational during critical failures or outages. Relying on a single cloud provider for both your primary services and incident management can create a significant vulnerability. If that cloud provider experiences an outage, your alert management system could become inaccessible precisely when it’s needed most, leading to delayed responses and extended downtime.

Battle-Tested Reliability Strategies - Incidentally Reliable with Abhishek Ghosh

We dive into the trenches with Abhishek Ghosh, a veteran who has led SRE teams at Pinterest, and now at Cribl. He shares gripping war room stories from Pinterest, strategies for maintaining uptime, insights into the role of AI in observability, and more! Discover the future of SRE and learn how to navigate the challenges of digital reliability. Tune in to gain valuable lessons from one of the industry's leading experts.

A CoPE's Guide to Alert Management

Alerts are a perennial topic, and a CoPE will need to engage with them. The bounds of this problem space are formed by two types of alerts: Understanding what these alerts are and how to configure them is one thing. Thinking about what they each do for your organization, and how using one or the other affects things, is another. The latter will be the focus of this article.

Intelligent Alerting, Fewer Headaches: Insider View at ilert AIOps

You might have noticed that we released a series of AI-supported features last year. Intelligent alert grouping, developed to reduce alert fatigue, is the icing on the cake. ‍ With it, we combined all ilert AI features in a new powerful add-on that aims to reduce stress and give more clarity during IT incidents.

Tutorial: Integrating Grafana with Zenduty

Zenduty is a distributed, end-to-end major incident management platform for production engineering teams, that helps you minimize downtime, implement scalable incident response processes and institutionalize site reliability within your organization. Grafana is a multi-platform open source analytics and interactive visualization web application. It can produce charts, graphs, and alerts for the web when connected to supported data sources.

Tutorial: Integrating Prometheus with Zenduty

Zenduty is a distributed, end-to-end major incident management platform for production engineering teams, that helps you minimize downtime, implement scalable incident response processes and institutionalize site reliability within your organization. Alertmanager is a powerful component of the Prometheus ecosystem designed to handle alerts. It manages alerts by deduplicating, grouping, and routing them to the appropriate receiver integrations such as email, Slack, or custom webhooks.

Smart Homes for Seniors' Health and Well-being

Smart homes can improve care for seniors by monitoring their health and well-being. These homes have advanced technology that tracks vital signs like heart rate, blood pressure, and oxygen levels. This real-time data helps caregivers and healthcare providers monitor seniors' health. Wearable devices and intelligent scales are part of the smart home system. They monitor things like weight changes, sleep patterns, and activity levels. The system can let caregivers and healthcare providers know if something seems off. This can help catch health problems early on.