Operations | Monitoring | ITSM | DevOps | Cloud

Blog

Datadog's Trace Outliers automatically surfaces error patterns across your environment

When monitoring highly distributed applications, which might rely on hundreds of services and infrastructure components across multiple cloud-based and on-premise environments, identifying problems and pinpointing the origin of an issue can be challenging. Even if you already have robust monitoring and alerts, your infrastructure and applications will likely change over time, which may make it difficult to reliably detect irregular behavior.

How to successfully correlate metrics, logs, and traces in Grafana

As everyone knows, the Grafana project began with a goal to make the dashboarding experience better for everyone, and to make it easy to create beautiful and useful dashboards like this one. But as Andrej Ocenas, a full stack developer at Grafana Labs, said in a recent FOSDEM 2020 presentation, the company has bigger ambitions for Grafana than just being a beautiful dashboarding application. What Grafana Labs is really aiming to do now is make Grafana into a full observability platform.

What's new in VMware vSphere 7

It finally happened. Many of us thought that vSphere 7 would be announced during VMworld last fall, instead it became a series of “teasers” and strategy changes we had to read about ‘in between the lines’ for what was coming in vSphere 7. Now it’s been determined. Public release happens May 1, with a pre-release event on April 2. So what do we have to look forward to?

OpsLogix VMware MP: Introducing vSAN support

VMware vSAN is a core component for the delivery of your Software Defined Data Center. The newly added monitoring capabilities will support your IT-Operations with even better App-To-Cloud Visibility of your business-critical applications running in the VMware platform. When the new version of our Management Pack is available, a complete release note will be added with all the new exciting features. But for now, we can reveal some of the key components we monitor.

Configure custom SSL certificate expiration thresholds

When we first launched Oh Dear, we had a fixed certificate expiration timer: 14 days. As soon as the expiration date came within 14 days, we'd start sending a daily reminder to hurry up and renew those certificates. Our first exception was made when Let's Encrypt gained more in popularity. We started notifying Let's Encrypt certificates 7 days before expiration date.

Best Practices for Pragmatic Incident Command

The goal of this piece is to provide some practical advice on how teams can coordinate and respond to complex, dynamic incidents. After all, incidents are unplanned investments that surface valuable learnings for improvement. For the purposes of this blog, we define incidents as situations where there is a need for coordination among multiple people working on the same problem. There will be incidents where this is not the case.

How To Use & Leverage The Citrix End-User Activity Report

* Originally published in 2018, reposted as COVID-19 drives increase need for tracking remote worker activity and performance. A major government agency enacted a new policy allowing employees to work from home using Citrix during times of inclement weather. This policy introduced a new visibility challenge for the organization. Read on to discover how they fixed the visibility gaps with the Citrix End-User Activity Report from Goliath.

From Web Scale to Edge Scale: Rancher 2.4 Supports 2,000 Clusters on its Way to 1 Million

Rancher 2.4 is here – with new under-the-hood changes that pave the way to supporting up to 1 million clusters. That’s probably the most exciting capability in the new version. But you might ask: why would anyone want to run thousands of Kubernetes clusters – let alone tens of thousands, hundreds of thousands or more? At Rancher Labs, we believe the future of Kubernetes is multi-cluster and fully heterogeneous.

While You Work from Home, Double Down on Elasticsearch Security

As engineers, you and I have a responsibility to protect both our customers’ and our respective companies’ data. But unlike our office networks that adhere to strict security protocols and a well-defined perimeter, our home networks usually fall short. And now that most of us are at home waiting out the COVID-19 pandemic, it’s time to revisit of logging in and Elasticsearch security during while you work from home.