Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

What is the 5G Edge and Multi-Access Edge Computing?

The 5G Edge is revolutionising the telecommunications industry by significantly enhancing network performance, bringing computing power closer to users, and dramatically reducing latency, enabling faster and more efficient services. This advancement is crucial for a variety of applications across different sectors, including smart cities, autonomous vehicles, healthcare, and industrial automation.

Four Simple Steps for Streaming DX NetOps Alarms into Google BigQuery

In today's interconnected world, ensuring network reliability and performance is not just important—it's a must. Network alarms serve as the first line of defense in identifying and mitigating potential issues, providing network operations teams with the actionable insights they need to respond swiftly and effectively. To truly empower network operations teams to boost agility and efficiency, these alarms must be real-time and actionable.

Introducing KlaudiaAI: Redefining Kubernetes Troubleshooting with the Power of AI

For years, AI in operations was plagued by noise—overwhelming alerts, false positives, and a lack of actionable insights. The tools available promised much, but often delivered little, leading to a loss of trust. However, with the groundbreaking work by platforms like OpenAI and the emergence of trustworthy AI tools like Copilot, the potential of AI in operations has never been nearer and clearer.

Prompt Guidance for Anthropic Claude and AWS Titan on AWS Bedrock

When working with advanced AI models like Anthropic’s Claude and AWS Titan, precise communication is key. Both models require specific prompting techniques to maximize their potential. For Claude, clear instructions, structured prompts, and role assignments lead to better accuracy and responsiveness. On the other hand, AWS Titan thrives on concise, well-defined requests and delivers streamlined outputs by default.

The big ideas behind retrieval augmented generation

It’s 10:00 p.m. on a Sunday when my 9th grader bursts into my room in tears. She says she doesn’t understand anything about algebra and is doomed to fail. I jump into supermom mode only to discover I don’t remember anything about high school math. So, I do what any supermom does in 2024 and head to ChatGPT for help. These generative AI chatbots are amazing. I quickly get a detailed explanation of how to solve all her problems.

MFA Configuration: How Automation Lets You Configure & Enforce MFA Compliance at Scale

You probably used multi-factor authentication (MFA) to access the device you’re using right now. Maybe your phone scanned your face or fingerprint to unlock. Maybe you got a text with a verification code while logging into your work browser profile. Configuring MFA is a go-to measure for system hardening, but MFA enforcement can get unruly, especially at the scale required by enterprise IT.

Creating In-Stream Alerts for Telemetry Data

Alerts that you receive from your observability tool are based on conditions that existed seconds to minutes in the past, because the alert is only triggered after the data has been indexed within the tool. This means that your ability to take timely action in response to the condition is significantly limited, and often your window of opportunity to react is past by the time you receive the alert.

Creating Re-Usable Components for Telemetry Pipelines

One challenge for the widespread adoption of telemetry pipelines for SRE teams within an organization is knowing where to start when building a pipeline. Faced with a wide assortment of sources, processors, and destinations, setting up a telemetry pipeline can seem like trying to build a Lego set without any instructions. The solution is to provide teams with pre-defined components that provide specific functionality, that they can then use to build pipelines that meet their own requirements.

Enhancing Postmortem Reports with AI

Postmortem reports are essential in incident management, helping teams learn from past mistakes and prevent future issues. Traditionally, creating these reports was a slow, tedious process, requiring teams to gather data from multiple sources and piece together what happened. But with AI and Large Language Models (LLMs), this process can become faster, smarter, and much less of a headache.