Operations | Monitoring | ITSM | DevOps | Cloud

10 Years of Failure Friday at PagerDuty: Fostering Resilience, Learning and Reliability

In today’s fast-paced and ever-evolving world of technology, failure is inevitable. Organizations should embrace failure as a learning opportunity for how to build and deliver more resilient services. At PagerDuty, we’ve practiced Failure Friday for 10 years now. Failure Friday–a practice inspired by the chaos engineering space–involves intentionally injecting failures into our systems to improve reliability and foster a proactive engineering culture.

ClickOps over GitOps - Civo Navigate NA 2023

In this presentation, Laszlo Fogas, the founder of Gimlet.io, introduces "ClickOps over GitOps." Discover how ClickOps revolutionizes cloud operations, enabling infrastructure as code changes with dashboard actions. Learn about the benefits of this approach, its role in platform engineering, and see practical use cases with live demos. Streamline your DevOps processes and avoid configuration drift by exploring ClickOps in this Navigate NA 2023 talk.

Lessons learned from integrating OpenAI into a Grafana data source

Interest in generative AI and large language models (LLMs) has exploded in popularity thanks to a slew of announcements and product releases, such as Stable Diffusion, Midjourney, OpenAI’s DALL-E, and ChatGPT. The arrival of ChatGPT in particular was a bellwether moment, especially for developers. For the first time, an LLM was readily available and good enough that even non-technical people could use it to generate prose, re-write emails, and generate code in seconds.

A Look at the Top 7 IT Automations for Highly Effective Organizations

More than three decades have passed since Stephen R. Covey made the world highly effective. His 1989 bestseller, The 7 Habits of Highly Effective People, inspired just about everyone. How could anyone pass up the chance to become highly effective by adopting just seven new habits!?

Application Performance Monitoring vs Application Performance Management: Understanding the Differences

Ensuring optimal application performance is a Herculean task tee’d up for today’s IT operations teams. Adding to the confusion is the shared acronym of the two most common practices: While the terms are similar, the approaches and use cases are different.

The Unplanned Show, Episode 6: Defining AIOps with Heather Newburn

“AIOps” is a term some love to hate, but what makes it useful? In this episode, Heath Newburn breaks down the three things to look for in an AIOps solution: reduce noise, create context, and reduce toil. He also explains the challenges with domain-specific approaches, versus domain-agnostic approaches to AIOps. But even within that approach, Heath warns of “gotchas” in rules “tech debt”, data formats, and overall long implementation times.

How to Install Sematext Experience on WordPress | Real User Monitoring on WordPress

WordPress websites have undeniable benefits, but do you have access to all the data you need to make critical business decisions and enhance your site's performance? With Sematext Experience, you gain valuable insights into your users' business journeys, track page load times, monitor HTTPS requests, and uncover a wealth of other crucial metrics.

Fastest Time-to-Value Anomaly Detection in Splunk: The Splunk App for Anomaly Detection 1.1.0

Anomaly detection in metrics or time series data is the most used machine learning use case among Splunk Security and Observability customers. Customers are looking for easy-to-use ML-powered high-fidelity anomaly detection, so that they can be alerted at the first sign of a failure point or security incident.