Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Data Analysis using Pyspark

2 Scenarios
55 Minutes
Beginner
masterclass poster
Industry
e-commerce
Skills
data-wrangling
data-understanding
batch-etl
data-storage
Tools
databricks
sql
spark
python

Learning Objectives

Perform Pyspark operations on structured data from a Data Lake.
Apply transformations like filtering, aggregation, and joins on data.
Work with nested JSON data using explode and dot syntax.
Learn techniques to flatten JSON data for better accessibility and analysis.

Overview

This module starts with analyzing structured data from a Data Lake using PySpark, where you'll perform operations like filtering, aggregations, and joins on structured datasets. It then covers handling JSON data using PySpark, focusing on working with nested structures using explode, dot notation, and flattening techniques to extract and process data efficiently.

Prerequisites

  • Basic understanding of Databricks and PySpark.