Operations | Monitoring | ITSM | DevOps | Cloud

%term

A CoPE's Duty: Indexing on Prod

Odds are that a software engineer today is really focused on one place: pre-prod. Short for “pre-production,” this is slang for an environment where software code operates in a prototype phase of its development lifecycle. Common sense would have one believe that this is a safe space, a workbench of sorts, where problems can be found and remediated.

Best practices for monitoring and remediating connection churn

Elevated connection churn can be a sign of an unhealthy distributed system. Connection churn refers to the rate of TCP client connections and disconnections in a system. Opening a connection incurs a CPU cost on both the client and server side. Keeping those connections alive also has a memory cost. Both the memory and CPU overhead can starve your client and server processes of resources for more important work.

What is the 5G Edge and Multi-Access Edge Computing?

The 5G Edge is revolutionising the telecommunications industry by significantly enhancing network performance, bringing computing power closer to users, and dramatically reducing latency, enabling faster and more efficient services. This advancement is crucial for a variety of applications across different sectors, including smart cities, autonomous vehicles, healthcare, and industrial automation.

Four Simple Steps for Streaming DX NetOps Alarms into Google BigQuery

In today's interconnected world, ensuring network reliability and performance is not just important—it's a must. Network alarms serve as the first line of defense in identifying and mitigating potential issues, providing network operations teams with the actionable insights they need to respond swiftly and effectively. To truly empower network operations teams to boost agility and efficiency, these alarms must be real-time and actionable.

Introducing KlaudiaAI: Redefining Kubernetes Troubleshooting with the Power of AI

For years, AI in operations was plagued by noise—overwhelming alerts, false positives, and a lack of actionable insights. The tools available promised much, but often delivered little, leading to a loss of trust. However, with the groundbreaking work by platforms like OpenAI and the emergence of trustworthy AI tools like Copilot, the potential of AI in operations has never been nearer and clearer.

Prompt Guidance for Anthropic Claude and AWS Titan on AWS Bedrock

When working with advanced AI models like Anthropic’s Claude and AWS Titan, precise communication is key. Both models require specific prompting techniques to maximize their potential. For Claude, clear instructions, structured prompts, and role assignments lead to better accuracy and responsiveness. On the other hand, AWS Titan thrives on concise, well-defined requests and delivers streamlined outputs by default.

The big ideas behind retrieval augmented generation

It’s 10:00 p.m. on a Sunday when my 9th grader bursts into my room in tears. She says she doesn’t understand anything about algebra and is doomed to fail. I jump into supermom mode only to discover I don’t remember anything about high school math. So, I do what any supermom does in 2024 and head to ChatGPT for help. These generative AI chatbots are amazing. I quickly get a detailed explanation of how to solve all her problems.

MFA Configuration: How Automation Lets You Configure & Enforce MFA Compliance at Scale

You probably used multi-factor authentication (MFA) to access the device you’re using right now. Maybe your phone scanned your face or fingerprint to unlock. Maybe you got a text with a verification code while logging into your work browser profile. Configuring MFA is a go-to measure for system hardening, but MFA enforcement can get unruly, especially at the scale required by enterprise IT.

Creating In-Stream Alerts for Telemetry Data

Alerts that you receive from your observability tool are based on conditions that existed seconds to minutes in the past, because the alert is only triggered after the data has been indexed within the tool. This means that your ability to take timely action in response to the condition is significantly limited, and often your window of opportunity to react is past by the time you receive the alert.

Creating Re-Usable Components for Telemetry Pipelines

One challenge for the widespread adoption of telemetry pipelines for SRE teams within an organization is knowing where to start when building a pipeline. Faced with a wide assortment of sources, processors, and destinations, setting up a telemetry pipeline can seem like trying to build a Lego set without any instructions. The solution is to provide teams with pre-defined components that provide specific functionality, that they can then use to build pipelines that meet their own requirements.