Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Log Management, Log Analytics and related technologies.

Apache Kafka Consumer Lag Monitoring

The world lives by processing the data. Humans process the data – each sound we hear, each picture we see – everything is data for our brain. The same goes for modern applications and algorithms – the data is the fuel that allows them to function and provide useful features. Even though such thinking is not new, what is new in recent years is the requirement of near-real-time processing of large quantities of events processed by our systems.

Kubernetes Incident Response Best Practices

Inevitably, organizations that use technology (regardless of the extent) will have something, somewhere, go wrong. The key to a successful organization is to have the tools and processes in place to handle these incidents and get systems restored in a repeatable and reliable way in as little time as possible.

CI/CD & DevOps Pipeline Analytics: A Primer

Tracking application-level and infrastructure-level metrics is part of what it takes to deliver software successfully. These metrics provide deep visibility into application environments, allowing teams to home in on performance issues that arise from within applications or infrastructure. What application and infrastructure metrics can’t deliver, however — at least not on their own — is breadth.

Slack's New Logging Storage Engine Challenges Elasticsearch

Elasticsearch has long been the prominent solution for log management and analytics. Cloud-native and microservices architectures, together with the surge in workload volumes and diversity, have surfaced some challenges for web-scale enterprises such as Slack and Twitter. My podcast guest Suman Karumuri, a Sr. Staff software engineer at Slack, has made a career on solving this problem. In my chat with Suman, he discusses for the first time in a public space a new project from his team at Slack: KalDB.

C-Suite Reporting with Log Management

When security analysts choose technology, they approach the process like a mechanic looking to purchase a car. They want to look under the hood and see how the product works. They need to evaluate the product as a technologist. On the other hand, the c-suite has different evaluation criteria. Senior leadership approaches the process like a consumer buying a car.

Using AI & ML for Application Performance (APM)

Today, IT and site reliability engineering (SRE) teams face pressure to remediate problems faster than ever, within environments that are larger than ever, while contending with architectures that are more complex than ever. In the face of these challenges, artificial intelligence has become a must-have feature for managing complex application performance or availability problems at scale.

Cloud Log Management Strategy & Best Practices

For IT Operations and Site Reliability Engineering (SRE) teams, logging is nothing new. In fact, collecting and analyzing logs is one of the oldest cornerstones of performance management. Logs have been part and parcel of APM workflows for decades. Yet the logging strategies that worked in eras past often fall short today. That’s thanks to the advent of cloud-native computing, which has ushered in fundamental new challenges in the way teams aggregate, analyze, and manage logs.

Are You Curious? Announcing the Launch of Cribl Curious: A Q&A Site for the Cribl-Inclined

Our amazing user community is growing so fast that we want to give you more resources to learn and share your knowledge and experience with others. So…today we launch Cribl Curious! Curious is a Q&A site for asking and answering technical questions about Cribl Stream, Cloud, Edge, Packs, and AppScope. Goat a question about how something works in Cribl? Come on over to see how your peers have solved similar problems. Checked the docs and it’s just not clicking for you?