Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

IT Outage Notification Templates and Incident Communication Examples

Outages cost millions and even billions for businesses across different spheres. For example, Amazon may lose up to $34 billion in sales within an hour of downtime, and a service outage back in March cost Meta nearly 100 million in revenue. However, that’s not all that was lost. Due to poor outage notifications and a lack of resolution details, many Meta users were kept in the dark about the outage. This Reddit thread shows many users were frustrated.

Early Cloud Adopters Are Rethinking Their Strategies

The early cloud migration gold rush promised agility, cost savings, and innovation. Yet, fast-forward a few years, and many of those “trailblazers” are now realizing their cloud strategy is anything but optimized. For those who lifted and shifted, hoping to catch the cloud wave, the tide is turning—and it’s not looking pretty. The truth? Cloud 1.0 is out. Simply moving your legacy apps to the cloud without rearchitecting was a band aid solution.

From Siloed IT to Coordinated IT: Navigating the First Steps Towards Autonomic IT

Imagine a world where IT runs itself, monitoring and optimizing technology investments as it runs. Where IT operations are continuous: always available, always responsive, always seamless, always delivering what your organization – and your customers – need. This is Autonomic IT. However, implementing Autonomic IT is not as simple as adding technology and flipping a switch.

How to detect broken links with Playwright

One of our Slack community members recently asked if they could use Playwright and Checkly to detect broken links on their sites. They certainly can, and the answer to this question covers so many different Playwright concepts that it makes a perfect case for sharing Playwright features with the community. Let's unveil some links going nowhere! If you prefer the video version of this tutorial,

Feature Friday #26: Groups custom promise type

There’s a users promise type for managing local users. However, did you know there is also a custom one for managing local groups? You might have seen it mentioned in the CFEngine Build announcement, the blog post on Managing local groups, or in the announcement supporting custom bodies post. But let’s take another look. The easiest way to integrate the groups custom promise type is by using cfbs, simply cfbs add promise-type-groups in your project.

CrowdStrike: Are Regulations Failing to Ensure Continuity of Essential Services?

In recent years, regulations have been enacted that intend to ensure the continuity of essential services and mitigate security and availability risks. These regulations include the Digital Operational Resilience Act (DORA) and Network and Information Systems Regulations (NIS Regulations). In light of the recent incident involving CrowdStrike's Falcon system, it is legitimate to ask whether these regulations are truly effective.

DevOps Maturity Assessment: A CTO's Guide

DevOps isn’t just a buzzword; it’s a crucial framework that can make or break your ability to deploy software effectively. As a CTO, it’s essential to gauge how well your team is performing in their DevOps practices. This isn’t about criticism—it’s about pinpointing where you are and plotting a course for where you need to be.

The Importance of Securing Data in Traces

Trace spans are captured in the runtime after decrypting the request. This means that any sensitive data is available in plain text. This is also the case for logging; however, logging requires an explicit log statement to be coded by the engineer. Additionally, engineers can add arbitrary information to trace spans, which could expose sensitive information. Collecting sensitive information in trace spans or logging events could expose an organization to a number of risks.

Machine Learning and AI Explained

There is no escaping the discussion about how machine learning (ML) and AI systems will revolutionize how people and industries work. Most of this discussion needs to be revised, as companies are still evaluating how AI systems (typically Large Language Model (LLM) systems like OpenAI ChatGPT, Google Gemini, Anthropic Claude and others) enhance worker productivity and deliver business benefits. Cybersecurity is one sector where extensive use of AI-enhanced solutions is common.