Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

How to build the ideal engineering team dashboard

Toggle topic hub menu Engineering teams of today use a plethora of tools to perform different functions in the software development life cycle. While tools like Slack, Teams, etc. are great for quick notifications, they rarely give you a comprehensive view of the things in current state. Sure, you can switch between tabs for all your tools but an "Engineering dashboard" that brings this all together makes it much easier to consume quickly and effectively.

Evidence-Based Threat Detection With Corelight and Cribl

Organizations today face a growing list of obstacles as they try to improve their detection, coverage, and accuracy. For one, data proliferation is happening at an astronomical rate. When was the last time your network bandwidth went down? What about your license costs for data storage or your SIEM? Difficulties arise from overlapping and poorly integrated tools that generate disparate data streams and several operational efficiencies.

What IT Administrators Want to Know About Apple Vision Pro

Apple’s release of Apple Vision Pro on Feb. 2, 2024, sparked widespread anticipation among tech enthusiasts worldwide. Even more among enterprise customers when Apple announced MDM management capabilities in visionOS 1.1. Apple Vision Pro lets users interact with apps while remaining connected to their physical surroundings or immerse themselves entirely in a virtual environment of their choosing.

IT Incidents and the Role of Incident Response Teams (IRTs)

The digital world comes with advantages and inherent risks. These IT incidents, which can encompass cyberattacks, system outages, and data breaches, can have a devastating impact. Beyond financial losses, IT incidents disrupt business operations, damage reputations, and erode customer trust. During an outage, having a well-prepared Incident Response Team (IRT) is essential to reduce downtime and improve response times.

Large Language Models (LLMs) Retrieval Augmented Generation (RAG) using Charmed OpenSearch

Large Language Models (LLMs) fall under the category of Generative AI (GenAI), an artificial intelligence type that produces content based on user-defined context. These models undergo training using an extensive dataset composed of trillions of combinations of words from natural language, enabling them to empower interactive and conversational applications across various scenarios.

What is INP and why you should care

On March 12th 2024, Google is launching a new Core Web Vital metric, Interaction to Next Paint (INP). INP will replace First Input Delay (FID) and will change the way your sites are assessed for performance by Google, which ultimately affects how your sites rank in search engine results. TL;DR: You need to start optimizing for INP today so your sites are not negatively impacted after March 12th.

You Can Solve the Application Waste Problem

If you’re like most companies running large-scale data intensive workloads in the cloud, you’ve realized that you have significant quantities of waste in your environment. Smart organizations implement a host of FinOps activities to ameliorate or address this waste and the cost it incurs, things such as: … and the list goes on. These are infrastructure-level optimizations.

Navigating IT Incidents - The Role Of The Status Page

At any moment, a small failure at any point in your complex web of IT systems can trigger an outage. As such, proactively establishing a method of clear and timely end user communication is the crux of effective incident response. For large organizations, these moments of downtime not only carry a massive opportunity cost, but also test the resilience of their operations.

Easy Guide to Monitor Jenkins Jobs Using Telegraf and MetricFire

Monitoring Jenkins jobs and nodes is foundational to maintaining a robust, efficient, and secure CI/CD pipeline. It enables DevOps teams to stay proactive about system health, optimize performance, manage resources effectively, and adhere to security and compliance standards. In this article, we'll detail how to use the Telegraf agent to collect performance metrics from your Jenkins environment, and forward them to a datasource.

Introducing Process Exhaustion: How to scale your services without overwhelming your systems

We rarely think about how many processes are running on our systems. Modern CPUs are powerful enough to run thousands of processes concurrently, but at what point do our systems become oversaturated? When you’re running large-scale distributed applications, you might reach this limit sooner than you'd expect. How can you determine what that limit is, and how does that affect the number and complexity of the workloads you deploy?