Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Monthly Moo Update | October 2021

There’s a number of monitoring and observability solutions on the market today. It almost reminds me of the automobile market and the endless number of automobiles available. Sure, they all get you from point A to point B, in some way. But some automobiles do it faster, smoother, more efficiently, with guidance, more comfort, storage space, perhaps towing capability, and even autonomously. Moogsoft is the automobile you’ve been dreaming about in the monitoring and observability market.

FireHydrant expands Reliability Platform with Service Catalog

Today, we are happy to announce the launch of Service Catalog to help you better manage, query, and learn about the services that exist in your infrastructure. At FireHydrant, we envision a world where all software is reliable, and we’re on a mission to help every company that builds or operates software get closer to 100% reliability. Service Catalog helps you get closer to 100% reliability.

4 xMatters Use Cases That May Surprise You

xMatters is part technology, part service reliability, and a little bit of magic. If you’ve spent time on the xMatters website, you’ll likely have seen a number of valuable use cases for the platform—it can alert SREs when there’s a website outage, it can accelerate product development for DevOps teams, it can manage on-call schedules and alerts for support teams.

Facebook, Instagram, and Whatsapp's Outage - Understanding MTTR

Yesterday the most used social media platforms in the world were inaccessible for 6 hours straight. Later, in a press release, Facebook revealed that the outage was due to configuration changes in their routers. There is no doubt that Facebook has an intense incident response plan, yet a small blind spot resulted in a significant business interruption. So how do we avoid this? The truth is, outages and performance issues are bound to happen in any network.

The Aftermath of the Facebook 6-Hour Outage

Less than 24 hours ago, the world came to a “social standstill” as Facebook, and its sister companies, WhatsApp and Instagram, became unavailable, leaving its 3.5 billion users in a flap. The outage, which lasted almost 6 hours, shut off access for users and businesses all over the world and caused ripple effects that we will likely continue to see in the immediate (and perhaps not-so-immediate) future.

Evaluating Splunk On-Call Alternatives

Splunk On-Call (Formerly VictorOps) is a popular incident response and on-call management platform that allows engineering and operations teams to collaborate with ease and resolve issues faster. As part of the Splunk Observability Suite, Splunk On-Call is combined with related products to achieve the goal of bringing monitoring, troubleshooting, and investigation, into a single, comprehensive view — simplifying the process from incident detection to resolution.

How Service Catalog Increases Productivity

Productivity is defined by measuring the amount of output over a given time frame. However, this discounts the quality of output, which is crucial in moving toward a more complete definition of productivity. Relating to services, increases in productivity generally highlight the amount of feature releases over time. This leaves out the critical measurement of quality compared to quantity. This is where a Service Catalog can greatly enhance true productivity within an engineering organization.