Zenduty

Prometheus for multi-cluster setups

Jun 6, 2020 By Ankur Rawal In Zenduty

This tip is for those who are using Prometheus federation to monitor multiple clusters. How should alertmanager be configured for multiple clusters? Let us say that if there’s an issue for Cluster A it only needs to send an alert for cluster A? In such cases, every alert should be routed to proper team based on labels (if there is problem with application A on cluster B - team responsible should be notified). In the above case, two alerts are triggered by the same rule.

Read Post

Zenduty

Read more about Prometheus for multi-cluster setups

Trust-building elements to increase conversion rates

Jun 2, 2020 By Vishwa Krishnakumar In Zenduty

In order to have a pipeline with great conversion rates, one must integrate a number of design and copy updates into your application funnel for trust-building and user empowerment. These are also called service evidence, a term comes from The Design of Everyday Things by Don Norman.

Read Post

Zenduty

Read more about Trust-building elements to increase conversion rates

Zenduty - Incident Priorities and SLAs

May 29, 2020 By Zenduty In Zenduty

Incident Priorities and SLAs in Zenduty Incident SLAs let you set acknowledgement and resolution SLAs for your incidents. SLAs allow your teams to prioritize incidents as well as increase transparency amongst incident stakeholders - support, account managers and management. Incident priority is the sequence in which an Incident or Problem needs to be resolved, based on Impact and Urgency. Priority also defines response and resolution targets associated with Service Level Agreements. Each team in Zenduty can define their own priorities like P0/P1/P2/P3 or L0/L4/L16 etc.

View Video

Zenduty

Read more about Zenduty - Incident Priorities and SLAs

Using context to triage change-triggered incidents

May 27, 2020 By Vishwa Krishnakumar In Zenduty

One of the first things incident managers do when they get an alert page from Zenduty is to check the “Context” tab of the incident. Incident context is extremely critical to get a first responder’s view of what happened and what could possibly have caused it. Context tells you what happened before an incident. In the case of 40–50% of all incidents, Zenduty’s incident context can tell you within 5–10 seconds, what could be the cause of an incident.

Read Post

Zenduty

Read more about Using context to triage change-triggered incidents

Real-time alerts from Zabbix and escalation with Zenduty

May 21, 2020 By Vishwa Krishnakumar In Zenduty

Recently, one of our customers, a 20-member NOC team of a large B2C company, had set up Zabbix to monitor a network of over 1000+ servers, routers, and switches. The NOC team wanted to set up alerting, on-call scheduling, and an escalation matrix whenever a critical network component encountered any downtime. The NOC team used Slack as the primary communication channel and Zoom for real-time communication. For NOC teams like these running a very large operation, setting up alerting can be very tricky.

Read Post

Zenduty

Read more about Real-time alerts from Zabbix and escalation with Zenduty

Accelerating your Zendesk customer support response times by 50% and meeting support SLAs

May 10, 2020 By Vishwa Krishnakumar In Zenduty

Zendesk is one of the most popular ticketing support and customer service platforms available in the market. Two metrics that measure the effectiveness of your customer support are the response and resolution times — how soon are you able to respond to a customer ticket, and how soon are you able to mobilize relevant personnel, perform necessary remediation tasks and finally resolve the ticket.

Read Post

Zenduty

Read more about Accelerating your Zendesk customer support response times by 50% and meeting support SLAs

Monitoring service health and downtime events within your Google Cloud with Zenduty

May 1, 2020 By Vishwa Krishnakumar In Zenduty

Google Cloud Platform (GCP) is a collection of Google’s computing resources, made available via services to the general public as a public cloud offering. The GCP resources consist of physical hardware infrastructure — computers, hard disk drives, solid-state drives, and networking — contained within Google’s globally distributed data centers, where any of the components are custom designed using patterns similar to those available in the Open Compute Project.

Read Post

Zenduty

Read more about Monitoring service health and downtime events within your Google Cloud with Zenduty

Sending Azure Monitor outage notifications to Microsoft Teams

Apr 24, 2020 By Vishwa Krishnakumar In Zenduty

Microsoft Azure is a cloud computing service providing infrastructure as a service (IaaS), software as a service (SaaS) and platform as a service (PaaS) supporting multiple Microsoft Specific and third-party services and systems with 90+ compliance offerings and trusted by 95% of Fortune 500 companies to base their business on. What is a system downtime and how does it affect me or my business?

Read Post

Zenduty

Read more about Sending Azure Monitor outage notifications to Microsoft Teams

Azure service health alerts and escalation with Zenduty

Apr 23, 2020 By Vishwa Krishnakumar In Zenduty

Microsoft Azure is a cloud computing service providing infrastructure as a service (IaaS), software as a service (SaaS) and platform as a service (PaaS) supporting multiple Microsoft Specific and third-party services and systems with 90+ compliance offerings and trusted by 95% of Fortune 500 companies to base their business on. What is a system downtime and how does it affect me or my business?

Read Post

Zenduty

Read more about Azure service health alerts and escalation with Zenduty

Grafana alerts and incident escalation with Zenduty

Apr 22, 2020 By Vishwa Krishnakumar In Zenduty

Grafana is one of the most popular open-source visualization tools that can be used on top of a variety of different data stores but is most commonly used together with Graphite, InfluxDB, Prometheus, Elasticsearch, Prometheus, AWS CloudWatch, and many others. Reliability engineers use Grafana is its ability to bring together several data sources together in a unified dashboard and increase the observability of your production systems.

Read Post

Zenduty

Read more about Grafana alerts and incident escalation with Zenduty

Operations | Monitoring | ITSM | DevOps | Cloud

Zenduty

Prometheus for multi-cluster setups

Trust-building elements to increase conversion rates

Zenduty - Incident Priorities and SLAs

Using context to triage change-triggered incidents

Real-time alerts from Zabbix and escalation with Zenduty

Accelerating your Zendesk customer support response times by 50% and meeting support SLAs

Monitoring service health and downtime events within your Google Cloud with Zenduty

Sending Azure Monitor outage notifications to Microsoft Teams

Azure service health alerts and escalation with Zenduty

Grafana alerts and incident escalation with Zenduty

Monthly Archive

Follow Us