Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Kubernetes Liveness Probes: A Complete Guide

Kubernetes probes are essential tools for maintaining the health and reliability of applications running in containers. Among these, the liveness probe plays a critical role in checking if an application is running correctly. If it detects any problems, Kubernetes can automatically restart the affected container, thus ensuring the application remains available without manual intervention.

What are developer experience metrics?

Good software development teams are focused on outputs, and can bring key metrics to bear that illustrate just what the engineering organization is building on a daily, monthly and yearly basis. Developer productivity is often assessed retrospectively: if the team is hitting DORA metrics, we assume everything in the lifecycle before production is sound. But the best teams dig deeper, and aim to solve the problem backwards as well as forwards by looking at the process as well as the results.

Canonical announces the general availability of Charmed Kafka

27 February 2024: Today, Canonical announced the release of Charmed Kafka – an advanced solution for Apache Kafka® that provides everything users need to run Apache Kafka at scale. Apache Kafka is an event store that supports a range of contemporary applications including microservices architectures, streaming analytics and AI/ML use cases. Canonical Charmed Kafka simplifies deployment and operation of Kafka across public clouds and private data centres alike.

7 DevOps Best Practices You Should Be Following Now

In traditional engineering organizations, development and operations teams are often siloed, a scenario that can lead to friction between them. For example, developers are encouraged to write and release more and better code. Operations engineers are responsible for preventing errors and bugs from affecting customer experiences. As a result, operations teams frequently serve as gatekeepers and can significantly slow deployments down – to ensure everything works first.

Codefresh is joining Octopus Deploy to create the most powerful Kubernetes CD, GitOps, CI, and Argo platform

Today marks an important milestone as Codefresh joins forces with Octopus Deploy, a leading player in the Continuous Delivery space. For those less familiar with Octopus, they have been at the forefront of delivering cutting-edge Continuous Delivery for VMs, Windows, and recently stepped into Kubernetes as well.

6 Things Customers Love After Switching To CloudZero

Cloud costs are notoriously hard to predict—trickier than deciphering the emotions of a housecat. Traditional cost management tools leave many companies with a lack of visibility into where their money is going, which holds back engineering teams from making informed savings decisions. These tools also fail to bridge the gap with finance teams, who speak a different language than their developer counterparts.

Troubleshoot anomalies in workload performance with Watchdog Insights and Alerts for Live Processes

Processes—the service workloads that run on your infrastructure—are the building blocks of your application, and it’s critical to know how well they operate at every level of the stack. Degraded process performance can lead to downtime for your mission-critical services, resulting in loss of customer trust and potentially impacting revenue for the business.

Crafting new Linux schedulers with sched-ext, Rust and Ubuntu

In our ongoing exploration of Rust and Ubuntu, we delve into an experimental kernel project that leverages these technologies to create new schedulers for Linux. Playing around with CPU scheduling policies has always been a dream for many kernel hackers and OS enthusiasts. However, such material typically remains within the domain of a few core kernel developers with extensive years of experience.

Incident Commander Training Strategies: What The Books Don't Tell You

It has been lightly revised and reposted with his permission from the original article on Medium. So, you’re training incident commanders (IC), and you have your group read Google’s SRE books. Everyone knows what they are supposed to do and you are ready for any incident, right? Not quite. Half of your team complains that the descriptions are too vague or don’t apply to their situations, and the other half just starts to improvise. The result?