Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Business Observability: Everything Fintech Companies Want to Know

Fintech companies operate in a complex technological and regulatory environment. They rely heavily on cloud-native technologies and microservices architectures to handle financial transactions and data, often at a massive scale. To maximize application reliability, fintech companies need full visibility into their software systems and applications. An agile monitoring solution like observability is crucial to improving performance and user experience.

HA Kubernetes Monitoring using Prometheus and Thanos

In this article, we will deploy a clustered Prometheus setup that integrates Thanos. It is resilient against node failures and ensures appropriate data archiving. The setup is also scalable. It can span multiple Kubernetes clusters under the same monitoring umbrella. Finally, we will visualize and monitor all our data in accessible and beautiful Grafana dashboards.

Top 15 Infrastructure Monitoring Tools

Infrastructure monitoring tools ensure systems’ optimal performance and availability, enabling the identification and resolution of potential issues before they become complex. This article delves into the different infrastructure monitoring tools available and their impact on business continuity and operational efficiency.

Stile Education's Best-of-Breed Observability Strategy

"One of the best things we’ve gotten out of ChaosSearch is the ability to keep all of our data in S3. It’s cheap and easy to keep all of our data available and indexed. We can search through it at any time to dig deeper into problems that crop up." Learn more about how the Stile's team can now retain log data indefinitely, versus saving only a week or two of data in Elasticsearch. That change has increased the team’s capacity to use log data to solve business problems, and unlocked new opportunities to discover deeper product insights.

Our lessons from the latest AWS us-east-1 outage

In case you missed it, AWS experienced an outage or "elevated error rates" on their AWS Lambda APIs in the us-east-1 region between 18:52 UTC and 20:15 UTC on June 13, 2023. If this sounds familiar, it's because it's almost a replay of what happened on December 7, 2021, although that outage was significantly more severe and took longer to restore.

Top 10 Log Management Tools in 2023

Log Management tools are crucial for the security and performance of your IT infrastructure. With the right log management system, you can quickly detect and respond to any anomaly or performance issue. Presently, there are numerous log management platforms. Each with its own unique set of features and benefits. While most of these platforms offer industry-standard capabilities, what sets them apart from each other are the stand-out features, pricing, and overall user experience.