Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Maintaining reliable services with advanced Cloud Logging features

We’ve covered ingesting, routing, storing, and viewing logs from your services in Cloud Logging already, but what else can you do with all that data? In this episode of Engineering for Reliability, we show how you can use advanced features like alerting on logs, logs-based metrics, and capturing application exceptions in Error Reporting. Watch to learn how you can find issues faster, make your services more reliable, and keep your users happy.

Continuous integration of Deno APIs

Development teams provisioning software services face a constant trade-off between speed and accuracy. New features should be made available in the least possible time with a high amount of accuracy, meaning no downtime. Unforeseen downtime due to human error is common for any manual integration processes your team uses to manage codebases. This kind of unexpected interruption can be one of the key drivers for a team to take on the challenge of automating their integration process.

Rate Limiting with the HAProxy Kubernetes Ingress Controller

Add IP-by-IP rate limiting to the HAProxy Kubernetes Ingress Controller. DDoS (distributed denial of service) events occur when an attacker or group of attackers flood your application or API with disruptive traffic, hoping to exhaust its resources and prevent it from functioning properly. Bots and scrapers, too, can misbehave, making far more requests than is reasonable.

Monitoring Amazon cloudfront with Graphite via Graphite APIs

MetricFire offers a complete system, infrastructure, and application monitoring using a suite of open-source monitoring tools. With MetricFire, you can monitor all your infrastructure on a single dashboard. The platform displays metrics on the dashboard using either Hosted Prometheus or Graphite-as-a-Service.

Introducing our open source SLO Tracker - A simple tool to track SLOs and Error Budget

One of the tools we use internally at Squadcast for SLO and Error Budget tracking is now open-source. In keeping up with the SRE ideology of automating as many ops tasks as possible, we built this SLO Tracker. We made this open-source so that the SRE community can also use it too. Looking forward to get your feedback, suggestions and patches :)

Top 8 uses of cloud computing

The cloud is gaining widespread adoption. For many organizations, cloud computing has become an indispensable tool for communication and collaboration across distributed teams. Whether you are on Amazon Web Services (AWS), Google Cloud, or Azure. the cloud can reduce costs, increase flexibility, and optimize resources. If you have spent your career in buzzing server rooms full of cable nests, you may be wondering what all the fuss is about.

Kubernetes Master Class HA Rancher Managing EKS, GKE and AKS

In this Master Class session, we will focus on some of the challenges that exist when trying to create consistent alignment across multi-cloud clusters, and then demonstrate how SUSE Rancher can be used to not only close the gaps, but optimize standardization for managing multiple clusters (specifically Amazon EKS, Google Cloud Platform GKE and Microsoft Azure AKS) at scale.

Podcast: Break Things on Purpose | Omar Marrero, Chaos and Performance Engineering Lead at Kessel Run

In this episode, we chat with Omar Marrero, Chaos and Performance Engineering Lead at Kessel Run, a company at the forefront of delivering “combat capability that can sense and respond to any conflict in any domain, anytime, anywhere.” To say that Omar and Kessel Run are at the forefront is an understatement.