Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

SD-WAN Monitoring Survival Guide: Be the Master of Your Network

SD-WAN technology is a hot topic in the networking world, with many businesses transitioning to SD-WAN networks for the promise of improved performance and reliability. However, after migrating, numerous companies find themselves lacking in SD-WAN network visibility. This makes it difficult to identify and address performance issues and determine whether their SD-WAN service is meeting expectations. Are you tired of feeling like you're driving blindfolded when it comes to your company's network?

Tips and best practices for Docker container management

The arrival of Docker container technology brought with it an amazing array of capabilities. By encapsulating an entire software package, including its dependencies and libraries, into a single, portable container, Docker has made deployment across platforms such as AWS, Google Cloud, Microsoft Azure, and Apache a simple and straightforward process. When people talk about Docker, they probably talk about Docker Engine, the runtime that allows you to build and run containers.

What are the 4 DORA metrics, and what do they mean for Ops teams?

Performance monitoring has become increasingly important for operations teams in today’s rapidly changing digital landscape. The DORA metrics are essential tools used to measure the performance of a DevOps team and ensure that all members work efficiently and collaboratively toward their goals. Here, we’ll explore what exactly DORA metrics are, how they work, and why companies should be paying attention to them if they want to set up an effective DevOps environment.

Write Loki queries easier with Grafana 9.4: Query validation, improved autocomplete, and more

At the beginning of every successful data exploration journey, a query is constructed. So, with this latest Grafana release, we are proud to introduce several new features aimed at improving the Grafana Loki querying experience. From query expression validation to seeing the query history in code editor and more, these updates are sure to make querying in Grafana even more efficient and intuitive, saving you time and frustration.

The Incident Commander Role: Duties & Best Practices for ICs

Imagine that a critical incident — a major outage, cyberattack or disaster — occurs out of nowhere in your company. In such a case, you'll try to minimize the damage and get back to normal operations as quickly as possible. But how will you do that? You've no idea how to manage such incidents. This is where incident commanders come in. They're trained professionals who lead the response to critical incidents.

Datadog On Reliability Engineering

There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritization flows, there’s no golden path for how to organize things. As Datadog has shifted from a startup to a quickly-growing public company, we’ve seen our own SRE practice evolve. With over 22,000 customers sending trillions of data points each day, keeping Datadog reliable is critical to our business.