Enqurious LogoTM

Use coupon code 'ENQSPARKS25' to get 100 credits for FREE

0
0
Days
0
0
Hours
0
0
Minutes
0
0
Seconds

Batch Processing through Databricks on Azure - Part 01

2 Scenarios
4 Hours 30 Minutes
project poster
Industry
e-commerce
Skills
data-storage
cloud-management
approach
data-understanding
data-quality
data-wrangling
Tools
azure
databricks
spark
sql

Learning Objectives

Learn Databricks Fundamentals
Ingest and Process Data
Understanding ADLS
Data Wrangling using Pyspark & Spark SQL

Overview

Globalmart, an ecommerce startup, faces challenges with data inaccuracies, schema inconsistencies, and a lack of trust in data systems from stakeholders. What measures are necessary to address and resolve these issues?


GlobalMart is a startup revolutionizing the shopping experience for its customers, both in the retail landscape and the online marketplace. As GlobalMart continues to expand, it is increasingly relying on data-driven decision making.

For GlobalMart to be data-driven, the stakeholders needs to be provided with accurate and refreshed data. Unfortunately, this has become a great challenge and bottleneck. The journey that started as a way to enhance operational efficiency and decision-making is now leading lot of friction between stakeholders.

Globalmart is now faced with following challenges

  • Data Silos and Absence of a Single Source of Truth
  • Data Inconsistency and Quality Issues
  • Lack of Access Control and Compliance Challenges
  • Complex and Time-Consuming Data Transformation Processes
  • Unclear Data Location and Origin Leading to Redundancy

These issues led to lack of trust in data systems rendering them useless. In this project you will be spending time to implement the following architecture that addresses all the problems that Globalmart is currently facing in their data systems

Image

Prerequisites

  • SQL Basics
  • Python Basics
  • Fundamentals of Pyspark and Distributing Computing