Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

The Medium is the Message: How to Master the Most Essential Incident Communication Channels

We’ve all seen it: a company experiencing a major incident and going radio silent, leaving their customers to wonder “Are they doing something about this?!”. If you’ve ever been on the inside of something like this, you know the answer is most likely yes, there are people working hard to put out the fire as quickly as possible. But when it comes to incidents, perception is reality for customers.

Looking Beyond Atlassian StatusPage: The 5 Best Alternatives

Status Pages are crucial cogs in your Incident Communication process, they serve as vital channels to keep your stakeholders informed during periods of downtime. Although there are many proficient tools in the market, such as Atlassian Status Page and Status.io, these standalone Status Pages can come with a hefty price tag, with various pricing plans and tiers for both Public and Private Status Pages. Moreover, with Atlassian Cloud’s recent issues, its dependability is in question.

Breaking Down the Pillars of Observability from Data to Outcomes

The world of cloud-native and distributed microservices has revolutionized software development and deployment. However, the sheer volume of data these systems generate can often lead to confusion and uncertainty. You're not alone if you've ever felt lost in the sea of observability data.

Webinar: Embracing Declarative Provisioning and Observability in cloud environments

Organizations face increasingly complex challenges in deploying and managing their systems in today's rapidly evolving technological landscape. Declarative provisioning and observability have emerged as a powerful approach to address these challenges. This talk delves into declarative provisioning and observability, exploring its benefits, principles, and practical implementation strategies.

Introduction to ELK Tech Stack

ELK Stack, also known as the Elastic Stack is a powerful and versatile open-source toolset that has revolutionized the way businesses manage and analyze their data. ELK Stack seamlessly integrates these three robust components to offer a comprehensive solution for searching, analyzing, and visualizing large volumes of data in real-time. So, buckle up, for a comprehensive overview of the ELK stack and its components, which will be a great starting point for beginners.

Pinpoint performance issues in downstream services with the Dependency Map Navigator

Visibility into the upstream and downstream dependencies of your services is key to maintaining a performant microservices environment. Application developers and SREs rely on this visibility to quickly trace issues back to the source, which is essential during incidents—when time is of the essence—throughout day-to-day operations, and as systems evolve and scale.

Enhanced Incident Response: Maximizing Microsoft Teams with Squadcast

Off late more and more businesses are relying on ChatOps tools like Microsoft Teams for a range of functions beyond simple communication. Incident management is no exception to this growing trend. However, Microsoft Teams alone may not possess all the necessary capabilities to efficiently perform these functions. To bridge this gap, integration with core applications becomes necessary.

Take back control of your Monitoring

The challenges in the monitoring world are known widely. We all know about these problems, what they are, and why they are important. While each one of the problems has its own solution, it all boils down to one thing – COST. How do we balance the tradeoffs without worrying about the huge costs of solving these challenges? For high-precision monitoring and observability, you need efficient and high-precision control levers. Take back control of your Monitoring with Levitate - a managed time series data warehouse.