Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Feature Spotlight: Centralized Log Collection

Speedscale is proud to announce its Centralized Log Collection capability. When diagnosing the source of problems in your API, more information is better. For most engineers, the diagnosis process usually starts with the application logs. Unfortunately, logs are usually either discarded or stored in Observability systems that engineers don’t have direct access to. Compounding this issue is that the log information is typically not correlated to what calls were made against the API.

Prevent Data Downtime with Anomaly Detection

A couple months ago, a Splunk admin told us about a bad experience with data downtime. Every morning, the first thing she would do is check that her company’s data pipelines didn’t break overnight. She would log into her Splunk dashboard and then run an SPL query to get last night’s ingest volume for their main Splunk index. This was to make sure nothing looked out of the ordinary.

Using Oracle Cloud as a Data Lake Made Simple With Cribl LogStream

All Cloud providers such as AWS, Azure, Google Cloud Platform, and Oracle Cloud offer Object Storage solutions to economically store large volumes of data and retrieve it on demand. It’s far cheaper to store one petabyte of data in object storage than in block storage. As AWS S3 has become the standard, many on-premise storage appliance vendors have incorporated S3 APIs to store and retrieve data. Oracle wisely continued that trend to OCI (Oracle Cloud Infrastructure).

The Five Tenets of Observability

A new year is a chance to have a new start, and one thing that it’s a great opportunity to think about is the monitoring and observability platform you’re using for your applications. If you’ve been using a legacy monitoring system, you’ve probably heard about observability all over the ‘net and want to figure out if this is really something you need to care about.

Make the most of your observability data with the Data Volume app

As a DevOps, SecOps, or IT operations manager, you're surrounded by all the technology for the systems running the entire organization. This means legacy infrastructure, multi-cloud environments, services, tools, and applications. All of these components generate data—a huge amount of data—some of which you need to leverage for full-stack observability to ensure those systems supporting the business are running efficiently.

A (de)bug's life: Diagnosing and fixing performance issues in Grafana Loki's read path

Beep, beep, beeeeeeeep. Read path SLO page, again. And I’ve almost found the noisy neighbor! That was me. And will probably be me again at some point in the future. As we continue to scale up the team that builds and runs Grafana Loki at Grafana Labs, I’ve decided to record how I find and diagnose problems in Loki.

How Reliability and Product Teams Collaborate at Booking.com

With more than 1.5M room nights booked per day, Booking.com requires a solid infrastructure that’s constantly monitored. And indeed, Booking.com now has a footprint of 50,000+ physical servers running across four data centers and six additional points of presence. The sheer size of this server fleet makes it viable for Booking.com to have dedicated teams specializing into looking only at the reliability of those servers.

What are CDN Logs and Why Do They Matter

Content Delivery Network produces numerous log files called CDN logs to deliver video across the internet to our homes and mobile devices. These logs contain crucial information about the CDN servers' performance and video streaming quality. Also, it contains terabytes of data, which has its own set of hurdles in terms of handling it in real-time and performing analytics to understand user experience and network concerns.

Top 10+ Best System Monitoring Software & Tools [2022 Comparison]

It’s virtually impossible to manage today’s complex IT environments at scale without a comprehensive system monitoring solution that allows you to check the health of all your applications and services from a single pane of glass. When your end users are experiencing difficulties, you must have such a tool in place that lets you quickly ascertain and remediate the root cause of the slowdown or error.

Harnessing AIOps to Improve System Security

You’ve probably seen the term AIOps appear as the subject of an article or talk recently, and there’s a reason. AIOps is merging DevOps principles with Artificial Intelligence, Big Data, and Machine Learning. It provides visibility into performance and system data on a massive scale, automating IT operations through multi-layered platforms while delivering real-time analytics.