Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Charmed Kubeflow 1.6 Beta #datascience #kubeflow #machinelearning

Kubeflow 1.6 is almost here! 🎉🎉🎉 The open source MLOps platform of choice keeps evolving year over year, growing in popularity and available features. Get the latest news about the changes that it came with from two of the engineers who were part of the upstream release team. We will be talking about pipelines, Katib and the news about the scheduler.

The 2022 Managed Kubernetes Showdown: GKE vs AKS vs EKS

Kubernetes may provide an abundance of benefits, but those who are using it may be well aware that it often requires quite a bit (or even a lot!) of effort and skill to run the platform independently. So – rather than having to put up with it on their own, organizations are able to pay for a managed Kubernetes service instead. This is where Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and Amazon Elastic Kubernetes Service (EKS) come in.

A new channel per incident - helpful or harmful?

I caught the tail-end of a Twitter thread the other day which centred around the use of Slack channels for incidents, and whether creating a new channel for each new incident is helpful or harmful. It turns out this is a much more evocative subject than I thought, and since I have opinions I thought I’d share them!

Uptime + Squadcast Integration: Routing Alerts Made Easy

Uptime is a site monitoring solution used to reach various endpoints & notify users via push notifications when downtime is detected. It collects and stores downtime & response time data & which is then made available as reports to the users. If you use Uptime for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Uptime to the right users in Squadcast. The below steps will help you set up Uptime and Squadcast integration.

See the big picture with the Service Dependency Graph

Understanding the impact and scope of an incident when degradation occurs is critical for returning your service online. This requires modeling the many downstream and upstream relationships between your services. Our new Service Dependency Graph provides a shortcut – a way to surface dependencies quickly, understand the relationship between services, and determine the scope or impact of an incident.

geeks+gurus: Rise of SRE - Survey Insights

Site Reliability Engineering (SRE) continues to rise in adoption. Teams that leverage SRE “good” practices are benefitting, individuals are excited about their jobs and IT and the business are collaborating more efficiently. Sounds interesting? We hope so, as there are a few key insights which you should know. Join us to learn more about the exciting journey of SRE. We have partnered with DevOps Institute (DOI) to conduct their inaugural 2022 Global SRE Pulse Survey, and we are excited to share the pulse on SRE.