Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Ubuntu AI | S2E3 | GPU utilisation optimisation at KubeconEU 2024

Maciej is not only the host of our podcast, but also an experienced keynote speaker. After a joint keynote at KubeconEU 2023 about highly sensitive data, in 2024, Maciej goes to Paris to talk about the GPU utilisation. During our podcast, we cover a lot of aspects of GPU utilisation. From best practices to existing tooling, there are different angles that Maciej talk about, giving a sneak-peak into his keynote. Are you curious how open source tooling plays a role in optimising the GPU utilisation? Listen to our podcast!

Advice for building an incident management program

On this weeks' episode of The Debrief, we chatted with Jeff Forde, an Architect on the Platform Engineering team at Collectors. With a background spanning finance, healthcare, and various product-led startups, Forde has honed his expertise in DevOps, site reliability, and platform engineering. Beyond his professional life, he's also a dedicated volunteer first responder and certified fire instructor in Connecticut, offering him a unique perspective on managing incidents of all typesz.

Azure Cost Management and FinOps: Lessons from the Frontlines

Azure Cost Management and FinOps: Lessons from the Frontlines This episode of "FinOps on Azure" dives into the crucial issue of managing Azure costs effectively. It addresses the common challenges faced by organizations in controlling their Azure spending and offers insights and strategies to prevent unexpected overspending. Through real-world experiences shared by Saravana Kumar, CEO of Kovai.co, viewers can gain valuable lessons on optimizing Azure consumption and establishing robust cost governance practices.

The Debrief: How to level up your incident management program with Jeff Forde of Collectors

Today, incident management is a core part of organizations both big and small. But what if you don't have a program in place...where do you start? Or what if incident management is already a key part of your org, but you're looking to optimize it—where do you kick things off in that case? Consider another situation: What if you're an established organization with years of incident management experience—what are some things that you can do to take things to the next level?

The Value Hosted Graphite brings to the Heroku Marketplace

Hosted Graphite is a time-series metrics monitoring tool used for application, systems, infrastructure and network monitoring. HostedGraphite is a Hosted Graphite service that offers the full capabilities and benefits of Graphite, without any of the hassle of trying to set up your own open-source Graphite installation.

How to Monitor ClickHouse With Telegraf and MetricFire

Monitoring your ClickHouse database is a proactive measure that helps maintain its health and ensure that it continues to meet the needs of your applications and users efficiently. It allows you to address issues before they become critical, ensuring that your database environment is secure, reliable, and performing optimally. In this article, we'll detail how to use the Telegraf agent to collect performance metrics from your ClickHouse clusters, and forward them to a datasource.