Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

HoneyByte: Incremental Instrumentation Beyond the Beeline

“It turns out,” said Liz, “it was not a giant pile of work to start adding those rich instrumentation spans as you need them.” Liz Fong-Jones was telling dev.to’s Molly Struve about an error she encountered while trying to update her dev.to profile. When she entered honeycomb.io into the Employer URL field, the app responded with an angry red box...

The Uptime.com Report for 2019

Unplanned downtime can drive significant losses in the form of unrealized revenue. Teams may be caught off guard, or may face an outage outside their control, extending downtime hours unnecessarily. Without automated monitoring and alerting, teams face undetected outages that silently threaten SLA fulfillment. The recommendations in this report are best used as a guide on what trends may drive Site Reliability Engineering in the near term.

Closer Look: Observability

As enterprise IT systems have become more complex and distributed due to cloud infrastructure, containers, serverless technology, an ever-growing footprint of applications and devices, IoT, SDN, open source development tools and more, the practice of performance monitoring has become far more nuanced. In these modern IT environments, traditional monitoring practices centered on known issues aren’t enough.

6 Common Mistakes in AWS EC2 and Azure Cloud VM Optimization

No matter what’s driving your move to an AWS or Azure cloud, two things are true. One, you don’t want to under-provision, which could create performance and availability issues. And two, you don’t want to overpay, because no one ever wants to do that. One of the key decisions you must make is which Amazon EC2 or Microsoft Azure virtual machine instance configuration you need. It’s a scoping exercise, but several factors make this easier said than done.

Remote Working: Encrypt 15k Devices in 3 Days? No problem.

Right now, millions of people are working remotely for the first time, and they’re doing so on company laptops and mobile devices. And with millions of these devices now offsite, this throws but one more wrinkle in tech support’s security plans—in addition to worrying about insecure networks and malware attacks, IT must also safeguard against physical theft. Yes, device encryption is the logical fail-safe for such a scenario and a must-have for any remote IT setup.

Handling the emerging security challenges and possible concept change

With current global crisis spreading into multiple areas of information technologies, it is crucial to learn how are the security-related areas affected, and what it would mean for the entire IT industry. Remote access to network resources results in both increased load on new and existing tools allowing performing most activity remotely (to grasp the possible scale of impact: read, for example, about recent Zoom service controversies).

Understanding and Baselining Network Behaviour using Machine Learning - Part I

Managing a network more effectively has been something our customers have been asking us about for many years, but it has become an increasingly important topic as working from home becomes the new normal across the globe. In this blog series, I thought I’d present a few analytical techniques that we have seen our customers deploy on their network data to: Better understand their network and Develop baselines for network behaviour and detect anomalies.

Understanding and Baselining Network Behaviour using Machine Learning - Part II

A difficult question we come across with many customers is ‘what does normal look like for my network?’. There are many reasons why monitoring for changes in network behaviour is important, with some great examples in this article - such as flagging potential security risks or predicting potential outages.

Colonel Mustard in the Library with Microservices APM

As many of us are rediscovering an interest in board games, it feels relevant to make reference to Hasbro’s classic Clue. Understanding what’s going right or wrong in your sprawling digital business can feel a lot like a murder mystery: it was the authentication service in the east region with the memory exhaustion error. This analogy has a weakness when applied to modern operations. The Clue board game had 6 weapons, 6 suspects, and 9 rooms. That’s 324 combinations.

Visualizing observability with Kibana: Event rates and rate of change in TSVB

When working with observability data, a good portion of it comes in as time series data — things like CPU or memory utilization, network transfer, even application trace data. And the Elastic Stack offers powerful tools within Kibana for time series analysis, including TSVB (formerly Time Series Visual Builder). In this blog post, I’m going to attempt to demystify rates in TSVB by walking through three different types: positive rates, rate of change, and event rates.