Distributed Machine Learning With PySpark
Spark is known as a fast general-purpose cluster-computing framework for processing big data. In this post, we’re going to cover how Spark works under the hood and the things you need to know to be able to effectively perform distributing machine learning using PySpark. The post assumes basic familiarity with Python and the concepts of machine learning like regression, gradient descent, etc.