Building ETL Pipeline using Medallion Architecture
4 Scenarios
3 Hours 20 Minutes
Intermediate

Industry
general
e-commerce
Skills
approach
data-understanding
data-storage
data-quality
batch-etl
data-wrangling
data-modelling
quality
Tools
databricks
sql
python
Learning Objectives
Compare INSERT INTO, INSERT OVERWRITE, and COPY INTO for loading data into Delta tables.
Automate raw data ingestion into the Bronze layer using COPY INTO.
Apply schema enforcement and data cleaning techniques in the Silver layer.
Apply schema enforcement and data cleaning techniques in the Silver layer.
Overview
This module focuses on building an efficient ETL pipeline using the Medallion Architecture. You will start with the Bronze layer, exploring different data-loading methods like INSERT INTO, INSERT OVERWRITE, and COPY INTO to ingest raw data into Delta tables while ensuring scalability and incremental processing. Next, you will refine data in the Silver layer by enforcing schemas, cleaning, and structuring it for further analysis.
Finally, you will organize data in the Gold layer, optimizing it into fact and dimension tables for analytics and business insights. By the end, you will understand how to design a reliable data pipeline in Databricks.
Prerequisites
- Basic understanding of Databricks and Delta Lake.
- Familiarity with ETL concepts and SQL.
- Familiarity with Hive Metastore