Operations | Monitoring | ITSM | DevOps | Cloud

Why a big bang approach is the wrong cloud strategy

Despite all the hype from the big cloud providers the truth is that most organisations rely on hybrid infrastructures now and will do so for the foreseeable future. Typically, this includes on-premises infrastructure and at least two public cloud providers. This is not a step on a journey to being 100 per cent cloud, it is the strategic destination many have chosen.

Autoscale your Kubernetes workloads with any Datadog metric

Editor’s note: This post was updated on August 9, 2022, to include a demonstration of how to enable highly available support for HPA. It was also updated on November 12, 2020, to include a demonstration of how to autoscale Kubernetes workloads based on custom Datadog queries using the new DatadogMetric CRD.

Monitoring Rails applications with Datadog

Rails is a Ruby framework for developing web applications. It favors the Model-View-Controller (MVC) architecture and includes generators that create the files needed for each MVC component. Rails applications consist of a database, an application server for running application code, and a web server for processing requests. Rails provides multiple integrations for its supporting database (e.g., MySQL and PostgreSQL) and web server (e.g., Apache and NGINX).

Why AIOps may be necessary for the future of engineering

Machine learning has crossed the chasm. In 2020, McKinsey found that out of 2,395 companies surveyed, 50% had an ongoing investment in machine learning. By 2030, machine learning is predicted to deliver around $13 trillion. Before long, a good understanding of machine learning (ML) will be a central requirement in any technical strategy. The question is — what role is artificial intelligence (AI) going to play in engineering?

Demystifying AIOps for IT practitioners

If your organization is looking to improve its IT service management (ITSM) and/or IT Operations Management (ITOM) capabilities, then it’s probably considering Artificial Intelligence for IT Operations, which is commonly called “AIOps.” But what is AIOps, and how will it help your IT organization’s IT management capabilities and, ultimately, business operations and outcomes? Let’s start with an AIOps definition.

Tales from the Toil: Taking the pulse of SRE

Site Reliability Engineering (SRE) is a growing practice essential for enterprises to ensure service delivery, reliability, and access for users. Many companies only choose to invest in SRE when they have a raging operational fire on their hands. As a result, SREs often start out as firefighters, desperately trying to keep the service online for one more day.

5 Common Cybersecurity Mistakes You Can Easily Prevent

A comprehensive organizational strategy and robust company security policy are crucial for effective cybersecurity. A company needs to make a concerted effort to design, execute, and follow through with a plan to deal with cyber-risk management from top to bottom. There is no one-size-fits-all strategy for the needs of enterprises in managing cyber risk. But in order to maintain strong system security in the face of constant threats, there are some core principles that every company should follow.

OpenTelemetry Architecture: Understanding Collectors

Telemetry data is a powerful tool for understanding the behavior of complex systems. OpenTelemetry provides a platform-agnostic, open-source way to collect, process, and store telemetry data. This post explores the OpenTelemetry collector architecture, specifically focusing on the Collectors component. We'll look at how collectors work and how they can be used to process telemetry data from any system or application. We'll also discuss some benefits of using OpenTelemetry for your telemetry needs.

Rerouting of Kherson follows familiar gameplan

Since the beginning of June this year, internet connectivity in the Russian-held Ukrainian city of Kherson has been rerouted through Crimea, the peninsula in southern Ukraine that has been occupied by Russia since March 2014. As I explain in this blog post, the rerouting of internet service in Kherson appears to parallel what took place following the Russian annexation of the Crimean peninsula.

Pre- and post-deployment testing methodologies for CI/CD

Your team has worked hard on a software product for months, and it’s finally ready to release to your users! But then the worst-case scenario happens: a wide release soon indicates that the software is plagued with bugs and performance issues, resulting in poor reviews and widespread user dissatisfaction.