A holistic approach to securing Spark-based data engineering

Canonical

Jun 5, 2023

Canonical

Apache Spark is an open-source toolkit that helps users develop parallel, distributed data engineering and machine learning applications and run them at scale. In this webinar, Rob Gibbon – product manager, and Massimiliano Gori – senior information security lead, will survey the state of big data security best practices and outline both high level architectures and pragmatic steps that you can take to secure your Spark applications – wherever they may be running.

Watch this webinar where we discussed the following:

An introduction to Apache Spark - what it is and how it works
Motives and techniques of bad actors
How to identify and prioritise security requirements.
Pragmatic steps to secure Spark based on Kubernetes and object storage.

#apachespark #kubernetes #bigdata

00:00 Introduction

03:46 What is Apache Spark?

10:36 Potential security threats for Apache Spark

21:00 How to identify and prioritize security requirements

28:20 Pragmatic steps to secure Apache Spark

Canonical Charmed Spark:
https://ubuntu.com/data/spark

Blog: Big data security foundations in five steps:
https://ubuntu.com/blog/big-data-security-foundations-in-five-steps

Whitepaper: Accelerating Apache Kafka:
https://ubuntu.com/engage/7-approaches-accelerating-kafka-whitepaper

Whitepaper: Kubernetes operators explained:
https://ubuntu.com/engage/kubernetes-operators-explained-whitepaper

And follow our social accounts

LinkedIn:
https://bit.ly/3Jw6jGN
Twitter:
https://bit.ly/3OXSIJE
Facebook:
https://bit.ly/3Q15Yyn
Instagram:
https://bit.ly/3vE7Kxk

For more information visit https://www.ubuntu.com and https://www.canonical.com

#linux #ubuntu #canonical #opensource