Operations | Monitoring | ITSM | DevOps | Cloud

Recurring 'Service Restart' Remediation with Resolve Actions

In this demonstration we break down Resolve's incident automation, which helps identify recurring IT issues by searching for previous incidents within a specific timeframe. If multiple incidents are detected, the automation flags the issue as chronic, updates the incident, and assigns it for further investigation to prevent endless retries. This system expedites resolving recurring IT issues. What you'll learn.

#026 - Kubernetes for Humans Podcast with BJ Badyk (Nexxen)

BJ Badyk is a human who desires an easier life. Nerd from birth, his curiosity led him down a path through the start of ISPs, Silicon Valley during the dot-com bubble, the last few years of the Playboy brand, and into the world of Adtech. He currently runs the platform engineering team at Nexxen, where they work on unique ways of handling millions of requests per second with Kubernetes. The team was an early adopter of Talos Linux, which they now run at scale. He presented at TalosCon 2023 and continues to pursue simple solutions to complex problems.

Why "why" is the wrong question to be asking after incidents with Dennis Henry of Okta

In last week’s episode of The Debrief, we had on Colette Alexander, Director of Engineering at HashiCorp, to discuss some of the myths around incident response. In that conversation, one of the myths we spoke about was the idea that asking “why” is better than asking “how.” And how, in reality, asking "how" allows you to focus more on the contributing factors that led to an incident happening, whereas “why” tends to single out a person, which can lead to a lot of blame.

Migrating into the Future: A step-by-step guide to leaving your legacy NMS behind

Kentik's Josh Mayfield and Phil Gervasi dive into the essential steps and strategies for transitioning from traditional network management systems to more advanced, future-ready solutions. Learn how to update your network monitoring tools to adapt to the evolving demands of modern networks, understand the importance of streaming telemetry over SNMP, and get insights on leveraging new telemetry protocols. Whether you're looking to update your network's infrastructure or simply curious about the latest in network monitoring technology, this webinar is packed with valuable insights and practical advice.

The 4 Biggest Challenges of Scaling Cloud-Native AI Workloads

When working with #AI in cloud environments, traditional data provisioning and software testing methods don't work because of the behavior of AI and LLM APIs. In this Cloud Native Computing Foundation (CNCF) webinar recording, we discuss the top 4 challenges of scaling cloud-native AI workloads, and the solutions developers are turning to instead.

Managing cloud carbon emissions- A joint initiative by Aiven and Thoughtworks

Did you know that the tech sector is responsible for around the same volume of carbon emissions as the aviation industry? Cloud computing relies on large data centers and data transmission networks, making it one of the leading sources of energy and carbon emissions in tech. Moreover, challenges surrounding the reliability and accessibility of accurate cloud emissions data complicate the management of such data alongside an inclusive climate action. Aiven and Thoughtworks are driving a vision to tackle this issue head-on.

The Unplanned Show, Episode 32: Platform Engineering with Paula Kennedy

Supporting developer velocity AND operational efficiency, stability, and security doesn't happen by accident. In this episode, Dormain will sit down with Paula Kennedy to discuss how platform engineering supports businesses go faster, decrease risk, and increase efficiency.