Operations | Monitoring | ITSM | DevOps | Cloud

Improving your team's on-call experience

Your engineers probably dislike going on-call for your services. Some might even dread it. It doesn't have to be this way. With a few changes to how your team runs on-call, and deals with recurring alerts, you might find your team starting to enjoy it (as unimaginable as that sounds). I wrote this article as a follow-up to Getting over on-call anxiety.

Podcast: Break Things on Purpose | Zack Butcher, Founding Engineer at Tetrate

Welcome back to another edition of “Build Things on Purpose.” This time Jason is joined by Zack Butcher, a founding engineer at Tetrate. They also break down Istio’s ins and outs and the lessons learned there, the role of open source projects and their reception, and more. Tune in to this episode and others for all things chaos engineering!

Speedscale Announces New Software Release: Traffic Viewer for API Visibility in Kubernetes Clusters

We are excited to announce a new global release of our software with unique API visibility features to help organizations discover problems with their cloud services well before they impact customers in production.

UptimeRobot July 2021 Update: Heartbeat monitor and API rate limits

After adding new features in June we worked on fixing some minor bugs and improving the stability of our service. We’re happy to announce that besides those fixes, we were also able to introduce a major update to our heartbeat (background job) monitoring. It was requested by many, so let’s take a look at the details!

Archiving Is In, And Your Logs Are Here To Stay!

Archiving is in and your logs are here to stay! We develop features that streamline the log management processes for our users. Logs are information assets, and we understand that you need to retrieve, re-asses and draw insights from your historic logs. observIQ offers a simple integration with Amazon Web Services (AWS) for extended retention. It takes less than 30 seconds to set up and archive logs directly to an S3 bucket in your AWS account.

Troubleshoot GKE apps faster with monitoring data in Cloud Logging

When you’re troubleshooting an application on Google Kubernetes Engine (GKE), the more context that you have on the issue, the faster you can resolve it. For example, did the pod exceed it’s memory allocation? Was there a permissions error reserving the storage volume? Did a rogue regex in the app pin the CPU? All of these questions require developers and operators to build a lot of troubleshooting context.

Grafana Community Plugin Showcase: August 2021

The power of community makes Grafana one of the most composable platforms for monitoring and observability across a wide variety of use cases. The Grafana Plugin Directory features not just plugins created by our team here at Grafana Labs, but by Grafana community members all over the world. It’s the best place to browse for new data source integrations, panels, and applications you can install on your dashboard to extend Grafana’s functionality.

Troubleshooting Feature Flags with Komodor and Sentry

Komodor is a Kubernetes-native platform we’ve created to streamline troubleshooting. It was born out of frustrations we felt as developers, when we were required to waste hours of our time on troubleshooting, instead of focusing on what we really wanted to do - creating and innovating. Komodor sits on top of your K8s cluster and integrates with every existing tool you have, be it CI/CD, repo, monitoring, alerting, or communication.