Operations | Monitoring | ITSM | DevOps | Cloud

Uptime vs. Availability

Unlike physical stores and organizations that operate during set hours, the IT world never sleeps. In today’s highly connected digital environment, many believe that when an investment is made in technology, it should be accessible at all times — which is virtually impossible to guarantee. Since disruptions occur, organizations should evaluate the services needed to run operations smoothly. For example, what services are required during an IT service outage to ensure minimal disruptions?

Testing locally with CircleCI runners

Many development teams start their CI/CD journey with a local build box (or six) that run their tests. In several mobile teams I worked on, for example, we had a few Mac Mini boxes with physical devices plugged in that we used for running local UI and unit tests. Eventually we migrated to a cloud-based solution, which brought us much greater stability and many new features. But moving to the cloud also meant our local hardware was obsolete.

Announcing lockc: Improving Container Security

The lockc project provides mandatory access controls (MAC) for container workloads. Its goal is to improve the current state of container/host isolation. The lockc team believes that container engines and runtimes do not provide enough isolation from the host, which I describe later in the “Why do we need it?” Section. In this blog post, I’ll provide an introduction to lockc, discuss why you need it and show you how to try it out for yourself.

Efficient Container Monitoring with Pepperdata

Container monitoring strategies and purpose-built container monitoring tools just may be the next hot topics swirling around the Kubernetes discussion forums this year. Over 77% of IT professionals expected to migrate 50% or more of their workloads to containers with Kubernetes by the end of last year. With the rise of container usage growing, having the ability to monitor the performance of your containerized workloads is critical.

Tonga downed by massive undersea volcanic eruption

On Saturday, the pacific island nation of Tonga was decimated by a massive volcanic eruption that was visible from space. At 5:27pm local time, the underwater volcano Hunga Tonga-Hunga Ha’apai unexpectedly erupted, sending ash and debris for hundreds of miles. As of this writing, all internet and telephone communications between Tonga and the rest of the world are still down.

How to Monitor Calico's eBPF Data Plane for Proactive Cluster Management

Monitoring is a critical part of any computer system that has been brought in to a production-ready state. No IT system exists in true isolation, and even the simplest systems interact in interesting ways with the systems “surrounding” them. Since compute time, memory, and long-term storage are all finite, it’s necessary at the very least to understand how these things are being allocated.

Azure Active Directory (Azure AD) - 101

This is a multi-part series that covers monitoring Microsoft Azure Active Directory (AD). In this blog post, which is part 1 of the series, you will learn about and understand Microsoft Azure Active Directory (Azure AD) and how it is different from an on-premises Active Directory (AD). As technology keeps evolving, companies increasingly look to technologies like Cloud Computing to expand, modernize and stay competitive, and in doing so companies can expose themselves to risks.

Improve Incident Response by Getting Control of Your (Unintelligent) Swarm

Incidents happen. Things go wrong. Systems fail. Sometimes they fail in unexpected and dramatic ways that create Major Incidents. PagerDuty makes a very specific distinction between an incident and an Incident. Your organization may also make such a distinction. Determining if an incident is major or not can come down to a number of factors, or a specific combination of factors, like the number of services affected, the customer impact, and the duration of the incident.