Operations | Monitoring | ITSM | DevOps | Cloud

DevOps

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Featured Post

Incidents are lessons, not failures

Delivering digital operations excellence - DevOps, incident management, and keeping organisations running - is a constant challenge. As customer digital expectations rise, so do the complexities of the tech stack and cloud services integrations. But to insist on 100% uptime and rush through incident management without taking learnings into account creates a poor culture that can damage the ability of the DevOps team. This is not how a business creates resilient infrastructure and high-performing teams.

Rootly On-Call: On-Call Shadowing Feature

Shadowing experienced responders is one of the most effective ways for folks who are new to on-call to gain the confidence and knowledge to handle incidents independently. Traditionally, shadow rotations are cumbersome to set up, involving duplicating and editing an existing schedule. For Rootly On-Call users, setting up shadow rotations couldn’t be easier with our new native Shadowing feature. Here are a few highlights.

MongoDB use cases for the telecommunications industry

A trusted database is fundamental to the smooth and secure operation of telecommunications services:, from network management and customer service to compliance and fraud prevention. MongoDB is one of the most widely used databases (DB Engines, 2024) for enterprises, including those in the telecommunications industry. It provides a sturdy, adaptable and trustworthy foundation. It also safeguards sensitive customer data while facilitating swift responses to rapidly evolving situations.

AKS Cost Optimization: How To Lower Your AKS Costs

Cloud-native applications continue to evolve and grow in complexity. And that complexity hurts the most when managing Kubernetes costs in Azure. AKS cost optimization may seem obvious, but it might also seem difficult to achieve. Microsoft’s fully managed Kubernetes service can help you run, manage, and deploy containerized applications. And while it optimizes performance, it can cause unexpected costs when improperly managed.

[New] Schedule Overrides is now live for every team member!

We are excited to announce a significant enhancement to our scheduling feature based on your valuable feedback! At Zenduty, we understand the importance of flexibility and efficiency in managing on-call schedules and ensuring seamless incident response. Previously, only team managers had the capability to edit schedules and add overrides. This meant that non-manager team members had to reach out to their managers to request override coverage, potentially delaying critical adjustments.

Enhancing Git Management in Python Projects

Git is an essential tool for version control, whether you are a developer or an IT pro. Git allows engineers to track changes, collaborate, and manage their code effectively. However, for beginners, navigating Git can be daunting. Enter GitLens, a powerful Visual Studio Code (VS Code) extension designed to enhance Git capabilities and simplify Git management.

Monitor Your ZFS Volume Manager With Telegraf

ZFS (Zettabyte File System) is a file system and volume manager that has robust data integrity features and uses checksums for every block of data, ensuring that any data corruption is detected and corrected. Additionally, it offers advanced features such as pooled storage, efficient snapshots and cloning, built-in data compression, deduplication, and high scalability, making it ideal for large-scale and high-performance storage environments.

Tops Metrics for CRM companies

CRMs are a valuable tool for businesses to organize their sales and customers. The benefits of having one include increased revenue, better visibility into accounts, automated tasks, and more. But, if your CRM needs to be fixed, it can create challenges for your business. CRM monitoring helps you fix problems before they become apparent. In this article, we’ll show you how to start with MetricFire.

Are Your Data Centers and IT Closets Prepared for the Next CrowdStrike Event?

On July 19, 2024, a major IT disaster struck when a CrowdStrike update caused widespread chaos. CrowdStrike, a cybersecurity firm, inadvertently pushed a faulty “sensor configuration update” for its Falcon Sensor software. This update caused 8.5 million Windows devices to crash. The impact was severe, affecting airlines, banking systems, and healthcare networks, and the recovery process was laborious, requiring manual intervention for impacted devices.