Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Implementing Medallion Architecture through Databricks on GCP

1 Scenario
2 Hours
project poster
Industry
e-commerce
Skills
approach
quality
data-understanding
data-storage
data-quality
data-wrangling
batch-etl
cloud-management
distributed-processing
Tools
databricks
spark
google-cloud

Learning Objectives

Ingest and Process Data
Implementing Medallion architecture
Design and implement dimensional models for the Gold layer
Efficiently loading only new or updated data to optimize ETL processes

Overview

Globalmart, an ecommerce startup, faces challenges with data inaccuracies, schema inconsistencies, and a lack of trust in data systems from stakeholders. What measures are necessary to address and resolve these issues?


GlobalMart is a startup revolutionizing the shopping experience for its customers, both in the retail landscape and the online marketplace. As GlobalMart continues to expand, it is increasingly relying on data-driven decision making.

For GlobalMart to be data-driven, the stakeholders needs to be provided with accurate and refreshed data. Unfortunately, this has become a great challenge and bottleneck. The journey that started as a way to enhance operational efficiency and decision-making is now leading lot of friction between stakeholders.

Globalmart is now faced with following challenges

  • Data Silos and Absence of a Single Source of Truth
  • Data Inconsistency and Quality Issues
  • Lack of Access Control and Compliance Challenges
  • Complex and Time-Consuming Data Transformation Processes
  • Unclear Data Location and Origin Leading to Redundancy

These issues led to lack of trust in data systems rendering them useless. In this project you will be spending time to implement the following architecture that addresses all the problems that Globalmart is currently facing in their data systems

Image

Prerequisites

  • Understanding Medallion architecture
  • Gain an understanding of how to manage incremental data loading.