Operations | Monitoring | ITSM | DevOps | Cloud

Twelve Key Learnings from PagerDuty People Team's Generative AI HackWeek

Sometimes innovation requires ideas unconstrained by traditional structures and removed from day-to-day responsibilities. It was in this spirit that PagerDuty’s People HackWeek–a friendly competition to explore how generative AI might impact the future of HR–was born.

Monitoring Kubernetes tutorial: Using Grafana and Prometheus

Behind the trends of cloud-native architectures and microservices lies a technical complexity, a paradigm shift, and a rugged learning curve. This complexity manifests itself in the design, deployment, and security, as well as everything that concerns the monitoring and observability of applications running in distributed systems like Kubernetes. Fortunately, there are tools to help developers overcome these obstacles.

Graphite vs Prometheus

Graphite and Prometheus are both great tools for monitoring networks, servers, other infrastructure, and applications. Both Graphite and Prometheus are what we call time-series monitoring systems, meaning they both focus on monitoring metrics that record data points over time. At MetricFire we offer a hosted version of Graphite, so our users can try it out on our free trial and see which works better in their case.

Top 5 Resiliency Trends of 2023

In today’s world, resilience is no longer a conditioned desire or methodology to try but has become a necessity for sustained success in software development and IT operations. As DevOps and Agile teams keep moving forward to cross boundaries, come up with new methodologies, and drive innovation, it is now important to have the ability to quickly recover from failures, adapt to changing conditions, and maintain high performance under pressure.

LogicMonitor Excels in G2 Fall 2023 Network Monitoring Report

Fall 2023 Reports, including Enterprise Monitoring and Cloud Infrastructure Monitoring, were announced September 12, 2023 from G2, the world’s leading business software review platform. Take a look inside the G2 Fall 2023 Network Monitoring Report highlights below to see where LogicMonitor stood out among the rest.

Azure Integration Automates Asset Discovery on Tidal Accelerator

We’re excited to share an update on our Microsoft Azure integration that automates discovery and mapping of key cloud assets into Tidal Accelerator. Tidal has enabled a new integration that pulls information on Azure Virtual Machines (VMs), Azure App Service, and Azure Database instances, Elastic Pools and Servers, directly into Tidal Accelerator for further analysis.

Revolutionize Data Ingestion: Introducing Terraform Support for Splunk Cloud Platform

Splunk Cloud Platform has always been a powerful platform for aggregating, analyzing, and extracting actionable insights from your machine-generated data. As data volumes continue to grow exponentially, efficiently managing the ingestion of data into Splunk becomes crucial. To address this need, we are thrilled to announce the debut of Terraform support for the Splunk Cloud Platform.

Deploying a multi-availability zone Kubernetes cluster for High Availability

Many cloud infrastructure providers make deploying services as easy as a few clicks. However, making those services high availability (HA) is a different story. What happens to your service if your cloud provider has an Availability Zone (AZ) outage? Will your application still work, and more importantly, can you prove it will still work? In this blog, we'll discuss AZ redundancy with a focus on Kubernetes clusters.

LLMs Demand Observability-Driven Development

Our industry is in the early days of an explosion in software using LLMs, as well as (separately, but relatedly) a revolution in how engineers write and run code, thanks to generative AI. Many software engineers are encountering LLMs for the very first time, while many ML engineers are being exposed directly to production systems for the very first time.

Seamlessly correlate DBM and APM telemetry to understand end-to-end query performance

When the services in your distributed application interact with a database, you need telemetry that gives you end-to-end visibility into query performance to troubleshoot application issues. But often there are obstacles: application developers don’t have visibility into the database or its infrastructure, and database administrators (DBAs) can’t attribute the database load to specific services.