Operations | Monitoring | ITSM | DevOps | Cloud

How to Control Alert Fatigue?

Alerts are indispensable to any IT operations system today. Site reliability engineers (SREs) or ITOps executives set up several monitoring tools for their IT landscape. When there is a change, high-risk action, or outage in any of these incidents, the monitoring tool triggers an automated alert. This could happen on the monitoring tool’s dashboard itself, via email, or enterprise collaboration tools like Slack or Teams.

The Five Data Pillars of Effective Root-Cause Analysis

The most effective way to understand an incident, resolve it and prevent it from occurring again is root-cause analysis. Simply put, root-cause analysis is the study performed by ITOps teams or site reliability engineers (SREs) to pinpoint the exact element/error that caused the unexpected behavior. Based on this, they plan remediation. Accurate and timely root-cause analysis can have a direct impact on the company’s top and bottom line.

Spotting and Avoiding Database Drift

Managing any database ecosystem is difficult enough: taking backups, maintaining statistics, and doing performance tuning all tax the time of the DBA or database developer. The job is complex even without considering the work you do to manage the various schema and data drifts that can occur. Unless you operate in a vacuum or within a single person organization (and even then, schema drift can occur), drift is going to manifest naturally and as the size of the environment expands.

A Question of When vs If: The Need for Your Security Incident Management Plan

Should all incidents be treated the same? Seems like a simple question, but the answer can have big implications. Think about an employee who contacts the service desk, complaining they can’t log onto their email. If the issue is due to a ‘stale’ password, dropped connection or configuration issue after an update for the email server, then the impact on the organization can be quantified to the lost productivity for the impacted employee or employees.

Tracing AWS Lambdas with OpenTelemetry and Elastic Observability

Open Telemetry represents an effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries. Recently, OpenTelemetry became a CNCF incubating project, but it already enjoys quite a significant community and vendor support. OpenTelemetry defines itself as “an observability framework for cloud-native software”, although it should be able to cover more than what we know as “cloud-native software”.

10 Most Dangerous New Malware and Security Threats in 2021

Cyberthreats including malware, viruses, and other security hazards are constantly evolving and becoming more dangerous and harder to detect. This makes it quite difficult to keep your data and information protected nowadays. Unless you are sure that you are absolutely protected, which is wishful thinking, you remain at risk of attacks by the latest strains of malware and security threats.

Painting a Complete Network Monitoring Picture: Why Context is Critical

In order to produce their masterpieces, artists like van Gough, Rembrandt, Picasso, and Monet painted with more than just one color. Being able to choose from multiple colors (not to mention an abundance of talent, inspiration, and creativity) is what allowed these artists to see their complete vision come to life on canvas. However, if you’re relying on a single set of data to troubleshoot network issues, it’s like you’re stuck painting with one color.

5 Best IT Experience Practices Your Team Can Make Today

If you were to put 100 enterprise tech leaders in a room together and ask them if they think their company’s employee experience is dependent upon IT, I’m certain all would agree it is. But I’m also certain those 100 wouldn’t know: For IT decision-makers, the devil is in the details. Many are judged by uncompromising Service Level Agreements (SLAs) and shoddy survey data, not comprehensive digital experience trends and indexes.

KVM hypervisor: a beginners' guide

KVM (Kernel-based Virtual Machine) is the leading open source virtualisation technology for Linux. It installs natively on all Linux distributions and turns underlying physical servers into hypervisors so that they can host multiple, isolated virtual machines (VMs). KVM comes with no licenses, type-1 hypervisor capabilities and a variety of performance extensions which makes it an ideal candidate for virtualisation and cloud infrastructure implementation.