Zomato Analysis

Netflix Analysis

Project Summary

I am working for my client Maven Market, a multi-national grocery chain with locations in Canada, Mexico & the United States. I will be working through the end-to-end business intelligence workflow:
Connecting to multiple data sources, transformation of data, building a relational model, adding calculated columns & measures, designing interactive reports, and publishing the reports to Power BI Services.

Project information

PART 1: Connecting & Transforming the Data

1) I am connecting Customers.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:

  • I've added a new column named "full_name" to merge the the "first_name" and "last_name" columns, separated by a space
  • Created a new column named "birth_year" to extract the year from the "birthdate" column, and format as text
  • Created a conditional column named "has_children" which equals "N" if "total_children" = 0, otherwise "Y"
  • Also updated datatypes of all columns to text and whole number as per their nature.
After data transformation through Power Query, the data will look like this 👇

2) I am connecting Products.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:

  • Added a calculated column named "discount_price", equal to 90% of the original retail price
  • Replaced "null" values with zeros in both the "recyclable" and "low-fat" columns
After data transformation through Power Query, the data will look like this 👇

3) I am connecting Stores.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:

  • Added a calculated column named "full_address", by merging "store_city", "store_state", and "store_country", separated by a comma and space
  • Added a calculated column named "area_code", by extracting the characters before the dash ("-") in the "store_phone" field
After data transformation through Power Query, the data will look like this 👇

4) I am connecting Calendar.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:

  • Adding add date columns this will help in time intelligence calculations:
    a) Start of Week
    b) Name of Day
    c) Start of Month
    d) Name of Month
    e) Quarter of Year
    f) Year
After data transformation through Power Query, the data will look like this 👇

5) I am connecting a folder in Power BI through Get Data feature to add Transactions_1997 and Transactions_1998. Also, doing some Data Cleansing as below:

  • Connected to the folder path, and choose "Edit" (vs. Combine and Edit)
  • Removed spaces and split the date columns by delimiters then merge in correct date format.
After data transformation through Power Query, the data will look like this 👇

I have connected all tables in Power BI, now our data is ready for Data Analysis and Data Visualization. Here's all tables:

Now let's move to Data Modelling and connect all related table with each other.

PART 2: Creating the Data Model

Using the report I created in Part 1, I will complete the following steps:
  • Connected Transactions to Customers, Products, and Stores using valid primary/foreign keys
  • Connected Transactions to Calendar using both date fields
  • Connected Returns to Products, Calendar, and Stores using valid primary/foreign keys
  • Connected Stores to Regions. This is snowflake schema as dimension table having other dimension table.
All tables are now connected with each other, now I'm adding some calculated columns & DAX measures and getting it ready for Data Analysis.

PART 3: Adding DAX Measures & Calculated Columns

Using the report I created in Part 2, now I'm adding all calculated column and measures with the help of DAX :
  • In the Calendar table, add a column named "Weekend" - Equals "Y" for Saturdays or Sundays (otherwise "N")
  • Weekend = IF(OR('Calendar'[Day Name]=="Saturday",'Calendar'[Day Name]=="Sunday"),"Y","N")
  • In the Calendar table, add a column named "End of Month" - Returns the last date of the current month for each row
  • End of Month = ENDOFMONTH('Calendar'[date].[Date])
  • In the Customers table, add a column named "Current Age" - Calculates current customer ages using the "birthdate" column and the TODAY() function
  • Current Age = DATEDIFF(Customers[birthdate],TODAY(),YEAR)
  • In the Customers table, add a column named "Priority" - Equals "High" for customers who own homes and have Golden membership cards (otherwise "Standard")
  • Priority = IF(AND(Customers[homeowner]=="Y",Customers[member_card]=="Golden"),"High","Standard")
  • In the Customers table, add a column named "Short_Country" - Returns the first three characters of the customer country, and converts to all uppercase
  • Short_Country = upper(LEFT(Customers[Country/Region],3))
  • In the Customers table, add a column named "House Number" - Extracts all characters/numbers before the first space in the "customer_address" column
  • House Number = LEFT(Customers[customer_address], SEARCH(" ", Customers[customer_address]))
  • In the Products table, add a column named "Price_Tier" - Equals "High" if the retail price is >$3, "Mid" if the retail price is >$1, and "Low" otherwise
  • Price_Tier = IF(Products[product_retail_price]>3, "High", IF(Products[product_retail_price]>1, "Mid", "Low"))
  • In the Stores table, add a column named "Years_Since_Remodel" - Calculates the number of years between the current date (TODAY()) and the last remodel date
  • Years_Since_Remodel = DATEDIFF(Stores[last_remodel_date],TODAY(), YEAR)
  • Create a new measure named "Weekend Transactions" to calculate transactions on weekends
  • Weekend Transactions = CALCULATE([Total Transactions], 'Calendar'[Weekend] = "Y")
  • Create new measures named "All Transactions" and "All Returns" to calculate grand total transactions and returns (regardless of filter context)
  • All Transactions = CALCULATE([Total Transactions], ALL(Transactions_Data))
    All Return = CALCULATE([Total Returns], ALL(Return_Data))
  • Create a new measure to calculate "Total Revenue" based on transaction quantity and product retail price, and format as $
  • Total Revenue = SUMX(Transactions_Data,Transactions_Data[quantity] * RELATED(Products[product_retail_price]))
  • Create a new measure to calculate "Total Cost" based on transaction quantity and product cost, and format as $
  • Total Cost = SUMX(Transactions_Data, Transactions_Data[quantity] * RELATED(Products[product_cost]))
  • Create a new measure named "Total Profit" to calculate total revenue minus total cost, and format as $
  • Total Profit = Transactions_Data[Total Revenue]-Transactions_Data[Total Cost]
  • Create a new measure to calculate "Profit Margin" by dividing total profit by total revenue calculate total revenue
  • Profit Margin = DIVIDE(Transactions_Data[Total Profit],Transactions_Data[Total Revenue])
  • Create a new measure named "Unique Products" to calculate the number of unique product names in the Products table
  • Unique Products = DISTINCTCOUNT(Products[product_name])
  • Create new measures named "Last Month Transactions", "Last Month Revenue", "Last Month Profit".
  • Last Month Transactions = CALCULATE(Transactions[Total Transactions], DATEADD('Calendar'[Date],-1,MONTH))
    Last Month Revenue = CALCULATE(Transactions[Total Revenue], DATEADD('Calendar'[Date],-1,MONTH))
    Last Month Profit = CALCULATE(Transactions[Total Profit], DATEADD('Calendar'[Date],-1,MONTH))

PART 4: Building the Report

Now its time to build interactive dashboards so that end business users can extract valuable insights for business. Here's my sample dashboards:

PART 5: Publishing the Report to Power BI Services and Subscribed to report:

I've successfully published the report to workspace, now it is be accessable by my client or any business user :
I've also created subscription for my client so that they will get a snapshot of report in their email:
Recieved report snapshot via email: