Latest Posts

How to improve your influence as an SRE

Nov 10, 2021 By Ricardo Castro In Squadcast

Improving your influence over the company will help you deliver high quality work as your goals will be closely aligned with those of the company. In this blog piece, Ricardo has explained how to improve your influence as an SRE. Balancing fast-paced business requirements with the demands of keeping production services stable is not an easy task.

Read Post

Squadcast

Read more about How to improve your influence as an SRE

Monitoring RabbitMQ with Bleemeo

Nov 10, 2021 By Florian Gabon In Bleemeo

This article will cover. how to configure RabbitMQ with Bleemeo to automatically collect metrics, and how to configure a dashboard to better understand your server and what's going on with Custom dashboards.

Read Post

Bleemeo

Read more about Monitoring RabbitMQ with Bleemeo

Features, the forgotten feature of Puppet

Nov 10, 2021 By Heston Snodgrass In Puppet

When you write enough Puppet code, you will eventually find yourself in need of a Facter fact or Puppet resource type that doesn’t exist in Puppet itself. Then, if you’re like me, you go to the Puppet Forge and see if someone else has written what you need. Oftentimes, you find what you need, add a new module to your Puppetfile or module metadata, and move on with your life.

Read Post

Puppet

Read more about Features, the forgotten feature of Puppet

Synthetic Testing and Real User Monitoring

Nov 10, 2021 By Request Metrics In Request Metrics

Synthetic Testing and Real User Monitoring are the most important tools in your performance toolbox. But they do different things and are useful at different times and many developers only spend time mastering one of these tools and only see a part of their performance problems, like trying to hammer in a screw. Let’s look at these tools, what they measure, and when to use them.

Read Post

Request Metrics

Read more about Synthetic Testing and Real User Monitoring

Loki 2.4 is easier to run with a new simplified deployment model

Nov 10, 2021 By Ed Welch In Grafana

Loki 2.4 is here! It comes with a very long list of cool new features, but there are a couple things I really want to focus on here. Be sure to check out the full release notes and of course the upgrade guide to get all the latest info about upgrading Loki. Also check out our ObservabilityCON 2021 session Why Loki is easier to use and operate than ever before.

Read Post

Grafana

Read more about Loki 2.4 is easier to run with a new simplified deployment model

Grafana Tempo 1.2 released: New features make monitoring traces 2x more efficient

Nov 10, 2021 By Joe Elliott In Grafana

Grafana Tempo 1.2 has been released! Among other things, we are proud to present both our first version to support search and the most performant version of Tempo ever released. There are also some minor breaking changes so make sure to check those out below. If you want ALL the details you can always check out the v1.2 changelog, but if that’s too much, this post will cover all the big ticket items.

Read Post

Grafana

Read more about Grafana Tempo 1.2 released: New features make monitoring traces 2x more efficient

Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Nov 10, 2021 By Elli Ludwigson In Mattermost

While service incidents can be wildly dissimilar, they tend to have one thing in common: a need for quick resolution. Response teams need a robust, repeatable process to follow that ensures fast, mistake-free execution, especially for those 4 AM calls. Having a documented checklist saved where the entire team can access and use it at any time could make the difference between quick resolution or compounding the problem.

Read Post

Mattermost

Read more about Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Enabling SRE best practices: new contextual traces in Cloud Logging

Nov 10, 2021 By Eyamba Ita In Google Operations

The need for relevant and contextual telemetry data to support online services has grown in the last decade as businesses undergo digital transformation. These data are typically the difference between proactively remediating application performance issues or costly service downtime. Distributed tracing is a key capability for improving application performance and reliability, as noted in SRE best practices.

Read Post

Google Operations

Read more about Enabling SRE best practices: new contextual traces in Cloud Logging

Network AF, Episode 5: Building relationships as an internet analyst with Doug Madory

Nov 10, 2021 By Michelle Kincaid In Kentik

Network AF welcomes Doug Madory to the podcast. Doug is a veteran, a researcher, a writer and Kentik’s director of internet analysis. With his start in the U.S. Air Force within its Information War Center, Doug has now been working in the networking industry for 12 years. After the Air Force, Doug went on to work for Renesys, which was acquired by Dyn, which was later acquired by Oracle.

Read Post

Kentik

Read more about Network AF, Episode 5: Building relationships as an internet analyst with Doug Madory

Icinga Customer Story: Deutsche Telekom IT

Nov 10, 2021 By Angelika Bang In Icinga

We are proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we’re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

Read Post

Icinga

Read more about Icinga Customer Story: Deutsche Telekom IT

Operations | Monitoring | ITSM | DevOps | Cloud

How to improve your influence as an SRE

Monitoring RabbitMQ with Bleemeo

Features, the forgotten feature of Puppet

Synthetic Testing and Real User Monitoring

Loki 2.4 is easier to run with a new simplified deployment model

Grafana Tempo 1.2 released: New features make monitoring traces 2x more efficient

Playbooks in Action: Creating Effective, Repeatable Incident Resolution Workflows

Enabling SRE best practices: new contextual traces in Cloud Logging

Network AF, Episode 5: Building relationships as an internet analyst with Doug Madory

Icinga Customer Story: Deutsche Telekom IT

Monthly Archive

Follow Us