Operations | Monitoring | ITSM | DevOps | Cloud

DevOps

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

How to monitor Kubernetes audit logs

Datadog operates large-scale Kubernetes clusters in production across multiple clouds. Along the way, audit logs have been extremely helpful for tracking user interactions with the API server, debugging issues, and getting clarity into our workloads. In this post, we’ll show you how to leverage the power of Kubernetes audit logs to get deep insight into your clusters.

Kubernetes Events Explained

Kubernetes events are a resource type in Kubernetes that are automatically created when other resources have state changes, errors, or other messages that should be broadcast to the system. While there is not a lot of documentation available for events, they are an invaluable resource when debugging issues in your Kubernetes cluster. In this post we will learn how to look at events, learn about a few specific event types, and discuss how to monitor Kubernetes events.

Bring Test Engineering into your DevOps practice

What do a test engineer and a DevOps or SRE team member have in common? The reality is that different teams need to proactively understand what is happening in production at critical milestones along the software engineering delivery cycle. In the words of Abby Bangser, senior test engineer at Moo, “Testing has so much in common with Ops and SRE teams. We need to ask interesting questions of production. We need no more debates whether a bug gets fixed.

Canary deployments for IT operations

This article originally appeared in Jaxcenter. Canary deployments are a commonly-used DevOps practice for staggered rollouts, sending small updates to groups in order to catch and fix issues. Ultimately, experimenting with DevOps practices such as canary deployments can help IT (and IT operations) bridge the gap with the business and deliver more value, faster.

De-Risk Application Re-Factoring

While cloud adoption in general has been on the rise, the migration of business-critical legacy application workloads to the cloud has been relatively cautious. Apart from financial risk, the primary reason for this precaution is the inherent risk to business operations. Customers who venture into application refactoring have broadly two options...

Part II: Artifactory as a Caching Mechanism for Package Managers

In our previous blog post we discussed the challenges with relying on external servers for downloading pre-build tools such as Curl, CLI, wget, Maven, Gradle, npm and others. We discussed how they can sometimes cause stability issues, also called “Environmental Issues”, that will break the build.

Monitor email workflows with Datadog Browser Tests

Monitoring your application from end to end is important for ensuring that core functionalities work as designed. Datadog’s browser tests help you verify that key user workflows—such as signing up for a new account—are consistent across devices and locations. Within these workflows, email often plays a key role in onboarding users and providing customers with important information about their accounts and application activity, such as profile changes and order confirmations.

Kubernetes Master Class: How to Run Databases in Production on Kubernetes

Databases are business-critical entities and data loss leads to major operational risk scenarios in any organization. A single operational or architectural failure can lead to significant loss of time and resources. This class will provide a real-world view into the challenges of maintaining state and running databases in production and show solutions managed by Rancher.