Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Applying Site Reliability Engineering 'Golden Signals' to your Kubernetes Cluster

Understanding how to monitor the "Golden Signals" of Site Reliability Engineering (SRE) in your Kubernetes cluster(s) is an important skill for any engineer, especially for Day 2 Operations. Fortunately, there are some very useful, powerful, and open source tools and technologies out there for accomplishing these tasks. This training session will go over how to monitor these "Golden Signals" in a Kubernetes cluster using Prometheus and Slack.

Lessons learned from running Kafka at Datadog

At Datadog, we operate 40+ Kafka and ZooKeeper clusters that process trillions of datapoints across multiple infrastructure platforms, data centers, and regions every day. Over the course of operating and scaling these clusters to support increasingly diverse and demanding workloads, we’ve learned a lot about Kafka—and what happens when its default behavior doesn’t align with expectations.

A Guide to the World of Cloud-Native Applications

It all started with monolith architecture; business logic, user interfaces, and data layers were stored in one big program. As tightly coupled applications, a simple update to the program meant recompiling the entire application and redistributing the program to all users. That led to the difficulty of maintaining consistent program versions and distribution across all clients in order to ensure stability and alignment. This made the monolith approach inefficient and cumbersome.

Securing a Web Application with AWS Application Load Balancer

I was recently called upon to secure an Nginx web server with HTTPS, and my goal was to set this up with a certificate obtained from AWS Certificate Manager. It took me a while to figure out how to get everything configured and working. Hopefully someone else who is attempting to do the same thing will read this and I can save you some time!

Announcing Preview Support for Istio

Today we are announcing support for Istio with Rancher 2.3 in Preview mode. Istio, and service mesh generally, has developed a huge amount of excitement in the Kubernetes ecosystem. Istio promises to add fault tolerance, canary rollouts, A/B testing, monitoring and metrics, tracing and observability, and authentication and authorization, eliminating the need for developers to instrument or write specific code to enable these capabilities.