Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

CD for machine learning: Deploy, monitor, retrain

While there are an increasing number of off-the-shelf machine learning (ML) solutions that promise to adapt to your specific requirements, organizations that are serious about investing in ML for the long term are building their own workflows tailored exactly to their data and the outcomes they expect. To make full use of this investment, ML models must be kept up to date and working from the freshest available data.

Securing Access to Cloud Native Resources with Certificates - Civo Navigate NA 2023

In this talk, Alan Vailliencourt, a Senior Solutions Engineer with Teleport, discusses the importance of moving away from passwords and securing access to cloud-native resources using short-lived certificates. He highlights the risks associated with passwords and showcases the benefits of identity-native access, incorporating proof of presence, mutual authentication, and device security. The talk provides practical steps for adopting certificate-based authentication and improving security posture for Kubernetes, databases, and other cloud resources.

The broader approach on Azure monitoring

This episode of Azure On Air podcast tackles the challenges in IT infrastructure monitoring and transitioning from on-premise to the cloud. Pedro Sousa, Microsoft Azure MVP, advocates for a shift from traditional monitoring to a holistic observability approach, starting with an understanding of business needs and working down to infrastructure details. Furthermore, he provides invaluable advice on migrating from on-premise to Azure, emphasizing the consistency of observability principles across environments.

Introducing Squadcast's Key Based Deduplication

We are excited to share another feature update with all our valued customers! We have recently gone live with our Key Based Deduplication feature, enabling you to define dedup keys using customizable templates for configured alert sources. With this feature, you can automatically group similar incidents and effectively deduplicate alerts.

Connecting Prometheus and Grafana

Using Prometheus and Grafana together is a great combination of tools for monitoring an infrastructure. In this article, we will discuss how Prometheus can be connected with Grafana and what makes Prometheus different from the rest of the tools in the market. MetricFire's product, Hosted Graphite, runs Graphite (a Prometheus alternative) with Grafana dashboards for you so you can have the reliability and ease of use that is hard to get while doing it in-house.

AWS CloudWatch Custom Metrics vs Prometheus Custom Metrics

Understanding the state of your systems and their underlying infrastructure at all times is paramount for ensuring the stability and reliability of your services. Up-to-date information about the performance and health of your deployments not only helps your team react to issues in real time, but it also gives them the security to make changes with confidence and to safely forecast system failures or performance hiccups even before they occur.

Monitoring Webapp Performance with Sitespeed

In today's digital landscape, optimal web application performance is crucial for business success. Slow loading times, unresponsive pages, and inefficient code can drive away users and harm your reputation. This makes monitoring web app performance extremely important to prevent them and to provide a smooth user experience. Sitespeed, a powerful web performance monitoring framework, analyzes metrics like page load time, resource usage, and user interactions to identify performance bottlenecks.

5 important features to look for in cybersecurity applications

In today’s digital landscape, organizations need the right cybersecurity applications to address evolving cyber threats effectively. To keep security teams aligned and streamline mission-critical workflows, one of the most important cybersecurity applications organizations need is a secure and efficient cybersecurity collaboration platform that enables seamless communication, information sharing, and coordinated incident response.

Enable and use GKE Control plane logs

Are you having any issues with the control plane components in your GKE Cluster? Are you interested in gaining visibility into the control plane side of the cluster to troubleshoot the issues by yourself? Then GKE Control Plane Logs is a great way to gain insights on what's going on with your cluster. In this video, we provide a quick overview about Control Plane components and logs, and show how to enable control plane logs on the new and existing GKE clusters. Watch this video to learn how to use Control plane logs to troubleshoot webhook and control plane latency issues in GKE clusters.