Enqurious LogoTM

Use coupon code 'ENQURIOUS25' to get 10 credits for FREE

Ending in
0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Tracking Incremental Data with Change Data Feed(CDF) in Databricks

2 Scenarios
2 Hours
Beginner
item card poster cover image
popular-iconPopular
3 credits
Industry
general
Skills
approach
data-understanding
data-storage
Tools
databricks

Learning Objectives

Sign up for Databricks Free Edition and create notebooks with serverless compute
Create schemas, volumes, and upload CSV files to Unity Catalog volumes
Understand the limitations of manual watermarking (created_at/updated_at columns) for change tracking
Use COPY INTO for idempotent file loading that prevents duplicate data
Track inserts, updates, and deletes automatically without application code changes
Query changed records by version or timestamp using table_changes()
Query changed records by version or timestamp using table_changes()

Overview

In this masterclass, you’ll learn how to track and process incremental data changes automatically using Change Data Feed (CDF) in Delta Lake.

What you’ll learn:

  • Set up Databricks Free Edition and create your first notebook with serverless compute.

  • Understand how CDF replaces manual watermarking (no need for created_at / updated_at columns).

  • See how Delta Lake tracks all changes directly at the storage layer.

  • Perform hands-on exercises using real customer data.

  • Use COPY INTO for idempotent data loading.

  • Query only changed records with table_changes().

Prerequisites

  • Basic understanding of SQL (SELECT, INSERT, CREATE TABLE)
  • Familiarity with data pipeline concepts (ETL, incremental loading)
  • Basic knowledge of Pyspark(reading data into dataframe)