Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Introduction to Spark as Distributed Computing Framework

3 Scenarios
3 Hours
masterclass poster
Industry
e-commerce
Skills
approach
distributed-processing
Tools
spark

Learning Objectives

Understand the advantages of Spark's in-memory processing and speed compared to Hadoop.
Learn how Spark facilitates multi-language support.
Learn to utilize Pyspark Dataframe API for structured data processing

Overview

At Amazon's Data team, Python has traditionally played a crucial role in extracting insights by identifying loyal customers and gauging the effectiveness of marketing campaigns across various product categories.

However, as the scope of data analysis widened to encompass more product categories, Python alone began to struggle with the increasing ingestion times of larger data files. This highlighted a growing problem faced in many data-heavy industries: the need for a more robust solution to manage large and complex data sets efficiently.

Prerequisites

  • Data Wrangling using Python