Tracking Incremental Data with Change Data Feed(CDF) in Databricks
2 Scenarios
2 Hours
Beginner

3 credits
Industry
general
Skills
approach
data-understanding
data-storage
Tools
databricks
Learning Objectives
Sign up for Databricks Free Edition and create notebooks with serverless compute
Create schemas, volumes, and upload CSV files to Unity Catalog volumes
Understand the limitations of manual watermarking (created_at/updated_at columns) for change tracking
Use COPY INTO for idempotent file loading that prevents duplicate data
Track inserts, updates, and deletes automatically without application code changes
Query changed records by version or timestamp using table_changes()
Query changed records by version or timestamp using table_changes()
Overview
In this masterclass, you’ll learn how to track and process incremental data changes automatically using Change Data Feed (CDF) in Delta Lake.
What you’ll learn:
-
Set up Databricks Free Edition and create your first notebook with serverless compute.
-
Understand how CDF replaces manual watermarking (no need for
created_at/updated_atcolumns). -
See how Delta Lake tracks all changes directly at the storage layer.
-
Perform hands-on exercises using real customer data.
-
Use COPY INTO for idempotent data loading.
-
Query only changed records with
table_changes().
Prerequisites
- Basic understanding of SQL (SELECT, INSERT, CREATE TABLE)
- Familiarity with data pipeline concepts (ETL, incremental loading)
- Basic knowledge of Pyspark(reading data into dataframe)