Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

The Ultimate Guide To Incident Communication in 2024

In the digital realm, incidents such as service disruptions and security breaches are inevitable. Incidents affect your customers and stakeholders. Also, incidents pose significant challenges to IT, Ops, DevOps, and customer support teams. As we increasingly depend on digital tools and services, the demand for seamless performance escalates, highlighting the importance of effective incident communication.

Your DevOps Checklist: 7 Software Deployment Best Practices

Failures and bugs are all too common during software deployment, even with the best development teams. Following software deployment best practices helps you increase efficiency, improve security, and bring products to market faster. This article combines the seven most valuable practices your team can start implementing today.

Elastic's RAG-based AI Assistant: Analyze application issues with LLMs and private GitHub issues

As an SRE, analyzing applications is more complex than ever. Not only do you have to ensure the application is running optimally to ensure great customer experiences, but you must also understand the inner workings in some cases to help troubleshoot. Analyzing issues in a production-based service is a team sport. It takes the SRE, DevOps, development, and support to get to the root cause and potentially remediate. If it's impacting, then it's even worse because there is a race against time.

Why business continuity belongs in the cloud?

Resilience in today’s liquid business environment demands flexibility. The term “observability” replaces monitoring, reflecting the need to adapt and be agile in the face of challenges. The key is to dissolve operations into the cloud, integrating tools and operational expertise for effective resilience. I remember that when I started my professional career (in a bank) one of the first tasks I was handled was to secure an email server exposed to the internet.

Using Kubectl Logs | Complete Guide to viewing Kubernetes Pod Logs

Information about the containers and pods on your cluster may be obtained using the kubectl logs command. These logs allow you to know the performance of your applications, whether they are failing or healthy, and are particularly useful for debugging and troubleshooting purposes. In this article, we will see how to use the kubectl logs command to get information from existing resources in a Kubernetes cluster. Before we dive in, let's first take a quick look Kubernetes architecture and logging.

Sentry on Sentry: How Metrics saved us $160K

If you know me, you know I care about fast code. Recently, I ran a simple query that revealed that we spend almost $160k a year on one task. Luckily, we launched the Metrics beta back in March. Over the last month or so, 10 of us Sentry engineers collaborated across many functions to leverage Metrics to track custom data points and pinpoint the issue leading to this ridiculous ingestion cost.

Observability, Telemetry, and Monitoring: Learn About the Differences

Over the past five years, software and systems have become increasingly complex and challenging for teams to understand. A challenging macroeconomic environment, the rise of generative AI, and further advancements in cloud computing compound the problems faced by many organizations. Simply understanding what’s broken is difficult enough, but trying to do so while balancing the need to constantly innovate and ship makes the problem worse.

What is Early Launch Anti Malware? An Overview

In an era dominated by digital advancements, cybersecurity has become the cornerstone of technological integrity and trust. The pivotal role of cybersecurity in today’s digital landscape is exemplified by the exponential rise in cyber threats—ranging from ransomware to sophisticated phishing attacks—that demand increasingly robust defensive mechanisms.