Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Building ETL Pipeline using Medallion Architecture

4 Scenarios
3 Hours 20 Minutes
Intermediate
masterclass poster
Industry
general
e-commerce
Skills
approach
data-understanding
data-storage
data-quality
batch-etl
data-wrangling
data-modelling
quality
Tools
databricks
sql
python

Learning Objectives

Compare INSERT INTO, INSERT OVERWRITE, and COPY INTO for loading data into Delta tables.
Automate raw data ingestion into the Bronze layer using COPY INTO.
Apply schema enforcement and data cleaning techniques in the Silver layer.
Apply schema enforcement and data cleaning techniques in the Silver layer.

Overview

This module focuses on building an efficient ETL pipeline using the Medallion Architecture. You will start with the Bronze layer, exploring different data-loading methods like INSERT INTO, INSERT OVERWRITE, and COPY INTO to ingest raw data into Delta tables while ensuring scalability and incremental processing. Next, you will refine data in the Silver layer by enforcing schemas, cleaning, and structuring it for further analysis.

Finally, you will organize data in the Gold layer, optimizing it into fact and dimension tables for analytics and business insights. By the end, you will understand how to design a reliable data pipeline in Databricks.

Prerequisites

  • Basic understanding of Databricks and Delta Lake.
  • Familiarity with ETL concepts and SQL.
  • Familiarity with Hive Metastore