Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

What IT Administrators Want to Know About Apple Vision Pro

Apple’s release of Apple Vision Pro on Feb. 2, 2024, sparked widespread anticipation among tech enthusiasts worldwide. Even more among enterprise customers when Apple announced MDM management capabilities in visionOS 1.1. Apple Vision Pro lets users interact with apps while remaining connected to their physical surroundings or immerse themselves entirely in a virtual environment of their choosing.

IT Incidents and the Role of Incident Response Teams (IRTs)

The digital world comes with advantages and inherent risks. These IT incidents, which can encompass cyberattacks, system outages, and data breaches, can have a devastating impact. Beyond financial losses, IT incidents disrupt business operations, damage reputations, and erode customer trust. During an outage, having a well-prepared Incident Response Team (IRT) is essential to reduce downtime and improve response times.

Large Language Models (LLMs) Retrieval Augmented Generation (RAG) using Charmed OpenSearch

Large Language Models (LLMs) fall under the category of Generative AI (GenAI), an artificial intelligence type that produces content based on user-defined context. These models undergo training using an extensive dataset composed of trillions of combinations of words from natural language, enabling them to empower interactive and conversational applications across various scenarios.

What is INP and why you should care

On March 12th 2024, Google is launching a new Core Web Vital metric, Interaction to Next Paint (INP). INP will replace First Input Delay (FID) and will change the way your sites are assessed for performance by Google, which ultimately affects how your sites rank in search engine results. TL;DR: You need to start optimizing for INP today so your sites are not negatively impacted after March 12th.

You Can Solve the Application Waste Problem

If you’re like most companies running large-scale data intensive workloads in the cloud, you’ve realized that you have significant quantities of waste in your environment. Smart organizations implement a host of FinOps activities to ameliorate or address this waste and the cost it incurs, things such as: … and the list goes on. These are infrastructure-level optimizations.

Navigating IT Incidents - The Role Of The Status Page

At any moment, a small failure at any point in your complex web of IT systems can trigger an outage. As such, proactively establishing a method of clear and timely end user communication is the crux of effective incident response. For large organizations, these moments of downtime not only carry a massive opportunity cost, but also test the resilience of their operations.

Easy Guide to Monitor Jenkins Jobs Using Telegraf and MetricFire

Monitoring Jenkins jobs and nodes is foundational to maintaining a robust, efficient, and secure CI/CD pipeline. It enables DevOps teams to stay proactive about system health, optimize performance, manage resources effectively, and adhere to security and compliance standards. In this article, we'll detail how to use the Telegraf agent to collect performance metrics from your Jenkins environment, and forward them to a datasource.

Introducing Process Exhaustion: How to scale your services without overwhelming your systems

We rarely think about how many processes are running on our systems. Modern CPUs are powerful enough to run thousands of processes concurrently, but at what point do our systems become oversaturated? When you’re running large-scale distributed applications, you might reach this limit sooner than you'd expect. How can you determine what that limit is, and how does that affect the number and complexity of the workloads you deploy?

Application Troubleshooting with Automated Root Cause Analysis

In the complex and fast-paced world of application deployment, getting a handle on the tangle of services and resources can sometimes feel like trying to find your way through a maze without a map. And if something goes wrong, trying to find out what's happening where is even more difficult. With alert emails flooding in and questions flying left and right, identifying the glitch that's causing issues can seem like a Herculean feat.