Operations | Monitoring | ITSM | DevOps | Cloud

D2IQ

Using Konvoy to Patch your Cluster Infrastructure (Part 1)

Recently we hit the infamous kmem bug in our internal Production Konvoy Cluster. We discovered that we were having this issue after users began reporting a particular CI Job was failing intermittently throughout the Cluster with the following error: From the Pod Logs: From the Kernel Logs.

Stabilizing Marathon: Part III

So far we covered team culture which amplifies our code culture and design. It was kind of abstract so far and you’ll be forgiven if you skipped right a way to this part. I will cover our test and release pipeline, the thing that probably has had the biggest impact on Marathon’s stability. The pipeline enabled us to discover issues before our users did. I will first give an overview of the pipeline stages and dive deep into the Loop. You will soon see what I meant by that.

Stabilizing Marathon: Part II

Part I covered our team culture which applies to many different types of work and teams. This part will cover our software engineering best practices that help us stabilize Marathon. Marathon is written in Scala and makes heavy use of Akka Actors and Streams. I probably don’t have to mention that Scala’s type system and its immutable data structures avoid a lot of bugs before we even run unit tests.

Stabilizing Marathon: Part I

This is a review of the last three years that we spent stabilizing Marathon. Marathon is the central workload scheduler in DC/OS. Most of the time when you launch an app or a service on DC/OS, it is Marathon that starts it on top of Apache Mesos. Mesos manages the compute and storage resources and Marathon orchestrates the workload. We sometimes dub it the “init.d of DC/OS”. Being such an integral part of DC/OS, we must ensure that it keeps functioning.

Double Header: Konvoy 1.5 and Kommander 1.1 Are GA!

Today we made Konvoy 1.5 and Kommander 1.1 generally available. In January, D2iQ defined a 12 month roadmap for Kommander and Konvoy. With these newest releases focused on the Single Enterprise Experience, that mission is halfway complete. Here are some of the highlights of the latest releases.

Q&A with Ziff Media Group: Why They Made the Switch to Kubernetes

Today’s leading companies are one step ahead of their competitors as they adopt new tools and disciplines emerging from the cloud native landscape. That was the case for Ziff Media Group, which is a collection of several media web properties including pcmag.com, mashable.com, deals.com, offers.com, and more.

KUDO for Kubeflow: The Enterprise Machine Learning Platform

Machine learning is the power cable for your business. Without it, your data center is a museum of hard drives. While machine learning can supercharge data-driven businesses, it requires both expertise and a complex suite of technologies to make it work. D2iQ’s KUDO for Kubeflow, which is in technical preview, is the enterprise platform designed to take you from prototype to production in no time.

Introducing Conductor

It comes as no surprise that the demand for Kubernetes is skyrocketing across the industry. According to the CNCF’s 2019 survey, 78% of respondents are using Kubernetes in production today. This growth is contributing to a surge of demand for talent: there are over 100 thousand cloud native job postings across Dice and Indeed alone. The talent pool of people that have worked with Kubernetes and the adjacent technologies is limited and demand is growing.