Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Batch Processing through Databricks on Azure - Part 02

5 Scenarios
16 Hours 53 Minutes
project poster
Industry
e-commerce
Skills
data-storage
data-quality
cloud-management
distributed-processing
code-versioning
data-wrangling
data-modelling
data-understanding
Tools
databricks
spark
sql
google-cloud
google-cloud-storage

Learning Objectives

Understanding Medallion architecture
Implementing Medallion architecture
Implement code versioning in Databricks and GitHub
Design and implement dimensional models for the Gold layer

Overview

Globalmart, an ecommerce startup, faces challenges with data inaccuracies, schema inconsistencies, and a lack of trust in data systems from stakeholders. What measures are necessary to address and resolve these issues?


GlobalMart is a startup revolutionizing the shopping experience for its customers, both in the retail landscape and the online marketplace. As GlobalMart continues to expand, it is increasingly relying on data-driven decision making.

For GlobalMart to be data-driven, the stakeholders needs to be provided with accurate and refreshed data. Unfortunately, this has become a great challenge and bottleneck. The journey that started as a way to enhance operational efficiency and decision-making is now leading lot of friction between stakeholders.

Globalmart is now faced with following challenges

  • Data Silos and Absence of a Single Source of Truth
  • Data Inconsistency and Quality Issues
  • Lack of Access Control and Compliance Challenges
  • Complex and Time-Consuming Data Transformation Processes
  • Unclear Data Location and Origin Leading to Redundancy

These issues led to lack of trust in data systems rendering them useless. In this project you will be spending time to implement the following architecture that addresses all the problems that Globalmart is currently facing in their data systems

Image

Prerequisites

  • SQL Basics
  • Python Basics
  • Fundamentals of Pyspark and Distributing Computing