Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Tutorial: How to Connect Jupyter Notebooks to Ocean for Apache Spark

Jupyter Notebook is a web-based interactive computational environment for creating notebook documents. It supports programming languages – such as Python, Scala, R – and is largely used for data engineering, data analysis, machine learning, and further interactive, exploratory computing. Think of notebooks like a developer console or terminal, but with an intuitive UI that allows for efficient iteration, debugging or exploration.

Ocean for Apache Spark goes GA on AWS

When Apache Spark introduced native support for Kubernetes it was a game changer for big data. Speed, scale and flexibility are now at the fingertips of data teams—-if they can master Kubernetes. It’s an uphill climb for even experienced DevOps teams. At Spot by NetApp, we’ve seen first-hand the challenges that companies are facing as they navigate the complexities of operating large-scale Kubernetes applications.

Orchestrate Spark pipelines with Airflow on Ocean for Apache Spark

Running Apache Spark applications on Kubernetes has a lot of benefits, but operating and managing Kubernetes at scale has significant challenges for data teams. With the recent addition of Ocean for Apache Spark to Spot’s suite of Kubernetes solutions, data teams have the power and flexibility of Kubernetes without the complexities. A cloud-native managed service, Ocean Spark automates cloud infrastructure and application management for Spark-on-Kubernetes.

Spot Ocean now supports Kubernetes pod topology spread constraints

As a premium autoscaler for containers and Kubernetes applications, Spot Ocean automatically and continuously executes scaling actions based on the requests and specified constraints of pods and specific containers. This container-driver autoscaling approach is core to how Ocean leverages and optimizes the compute infrastructure required to run containers in the cloud.

Ocean explained: Ocean controller deepdive

As a managed data plane service for containerized applications, Spot Ocean provides a severless experience for running containers in the cloud. Ocean integrates with the control plane of your choice, and handles key areas of infrastructure management, from provisioning compute and autoscaling, to pricing optimization and right-sizing. A core component of Ocean’s architecture is the Ocean controller, which is how Ocean and your Kubernetes cluster integrate and interact.

Cluster Roll feature enhancements now available

Spot by NetApp’s Ocean includes a powerful feature called “cluster roll.” This feature simplifies applying changes to Kubernetes worker nodes. Typical changes include applying a new image, modifying or adding user data, and updating security groups. A cluster roll applies these changes without having to disable the Ocean autoscaler. It also removes the need for you to manually attach new nodes or remove replaced nodes from the cluster.

Improve Apache Spark performance with the S3 magic committer

Most Apache Spark users overlook the choice of an S3 committer (a protocol used by Spark when writing output results to S3), because it is quite complex and documentation about it is scarce. This choice has a major impact on performance whenever you write data to S3. On average, a large portion of Spark jobs are spent writing to S3, so choosing the right S3 committer is important for AWS Spark users.

Ocean Insights now available for Google Cloud

As companies move more applications into the cloud, and package them into containers, environments become more complex with limited visibility. While infrastructure is abstracted away as much of it is delivered by the hyperscalers, this creates an opaqueness that makes it hard to control costs and understand resource utilization. As a result, many companies are experiencing high cloud bills and lots of cloud waste.

Ocean explained: container-driven autoscaling with Kubernetes

Whether you’re using a managed Kubernetes service like AWS EKS, GCP GKE or Azure AKS, or self-managing a DIY cluster deployed with open source tools like kops and Kubespray, the underlying hardware can vary from container to container. Each container requires specific resources (CPU/ memory/GPU/network/disk) and as long as the underlying infrastructure can provide those resources, the container will be able to execute its business logic.