Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Orchestrating Data Analysis Workflows using Airflow

1 Scenario
1 Hour
project poster
Industry
e-commerce
Skills
approach
quality
Tools
python
airflow

Learning Objectives

Extend and enhance an existing ETL pipeline in Apache Airflow.
Implement error handling and retry mechanisms in Airflow tasks.
Manage timezone differences in datetime fields between external databases and your Airflow environment.
Schedule and Manage DAG Execution:
Resume and Recover DAGs
Configure Alerts and Notifications
Understand default and non-default DAG Parameters
Perform Data Analysis and Load Results

Overview

In this project, you will extend a previously built ETL pipeline for a telecom company by adding advanced analytical tasks using Apache Airflow. The project involves creating two new analyses—billing amount analysis and late payment analysis—on top of existing incremental data backup tasks. The primary goal is to process, analyze, and store telecom data in an S3 bucket while overcoming challenges like handling error-prone scripts, managing timezone differences, scheduling DAG execution, and dealing with DAG failures.

Prerequisites

  • Basic understanding of Apache Airflow
  • Familiarity with ETL processes and data pipelines
  • Experience with AWS S3 for data storage
  • Basic knowledge of SQL and Python
  • Understanding of timezones and datetime handling in Python