Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Build Your Own Network with Linux and Wireguard

Last Christmas, I bought my wife “Explain the cloud like I am 10” after she told me many times that it was hard for her to relate to what I am doing in my daily work at Qovery. While so far, I have been the sole reader to enjoy the book, I was wondering during my lecture if there were any resources to explain how to build all that. Most topics are software oriented.. So, in this article, I am going to explain how to build your own cloud network 🎊

How to detect and prevent memory leaks in Kubernetes applications

In our last blog, we talked about the importance of setting memory requests when deploying applications to Kubernetes. We explained how memory requests lets you specify how much memory (RAM for short) Kubernetes should reserve for a pod before deploying it. However, this only helps your pod get deployed. What happens when your pod is running and gradually consumes more RAM over time?

Upgrade to DX UIM 20.4 CU9 to Leverage New Features and Security Updates

DX Unified Infrastructure Management (DX UIM) is a powerful solution that enables comprehensive infrastructure observability across your digital ecosystems, including private, public, and hybrid clouds. With DX UIM, you can proactively and efficiently manage the performance and availability of your IT infrastructure and applications. DX UIM 20.4 is the current main branch of the solution. This release offers a number of significant capabilities that weren’t available in earlier versions.

What is Mean Time Between Failures - and why does it matter for service availability

Mean Time Between Failures (MTBF) measures the average duration between repairable failures of a system or product. MTBF helps us anticipate how likely a system, application or service will fail within a specific period or how often a particular type of failure may occur. In short, MTBF is a vital incident metric that indicates product or service availability (i.e. uptime) and reliability.

Enhance Your Customer Service with PagerDuty for ServiceNow CSM

In today’s fast-paced, digital-first landscape, delivering exceptional customer experience is paramount to business success. For customer service teams, that means maintaining service level agreements (SLAs) and ensuring swift responses to customer issues that can make or break your company’s reputation. Fortunately, PagerDuty has improved the way companies handle customer service teams and has built applications into ServiceNow’s CSM platform.

Future-Proof Your Observability Strategy With CrowdStrike and Cribl

Traditional logging tools are struggling to keep up with the explosive pace of data growth. Data collection isn’t the most straightforward process — so deploying and configuring all the tools necessary to manage this growth is more difficult than ever, and navigating evolving logging and monitoring requirements only adds another layer of complexity to the situation.

The Machine Learning Magic Suite: Anomaly Detection

Cloud computing and AI/machine learning (ML) are two powerful technologies that are even more impactful when used together. Cloud computing provides the infrastructure and resources needed to support AI/ML applications; while AI/ML enhances cloud computing by providing intelligent automation and decision-making capabilities.

Join the ITOps AI Revolution: Actionable Insights with VMware Tanzu Insights

Many organizations struggle with managing thousands of services and applications. A typical environment consists of a combination of modern cloud applications, on-premises workloads, and workloads that are in the process of being moved to the cloud. IT and operations teams can easily be overwhelmed by the large volume of data and activity that is generated across these systems.