I continue to be intrigued by the evolution of software architectures and their impact on business. In my 20+ year career, I’ve participated in four of these architecture transitions – the shift from client-server to the internet, the rise of 3-tier architectures underpinning rich internet applications, virtualization that upended the dominance of hardware providers, and now the shift to microservices-based architectures based on cloud infrastructure and software automation.
If you’re building a new application from scratch and are responsible for maintaining its availability and performance, you might wonder whether you should be monitoring logs or metrics. For us, it’s a no-brainer that you’ll want both: metrics are fast and efficient for proactively monitoring the health of your system, while logs are essential for helping to troubleshoot the details of the issue itself to find the root cause.
Graphite Metrics are one of the most common metrics formats in application monitoring today. Originally designed in 2006 by Chris Davis at Orbitz and open-sourced in 2008, Graphite itself is a monitoring tool now used by many organizations both large and small.
We surveyed 1,264 chat users to find out, and we started with two seemingly simple questions. What we learned was fascinating and inspiring, so we gathered up the data and created the team chat guide.
With the proliferation of virtualization and high availability architecture, teams are chasing 99.999% uptime like knights of old hunted unicorns. Many site reliability engineers find more comfort in the Boy Scouts’ motto, “Always be prepared.” Your company’s Git server is mission critical to the daily operations of engineering and everyone they support. How do you create business continuity in the face of unpredictable circumstances?
This article explores integrating Google Pub/Sub with the world’s most popular open source log analysis platform — the ELK Stack, for deeper analysis and investigation.