Zomato Analysis
Netflix Analysis
Project Summary
I am working for my client Maven Market, a multi-national grocery chain with locations in Canada, Mexico & the United States. I will be working through the end-to-end business intelligence workflow:Connecting to multiple data sources, transformation of data, building a relational model, adding calculated columns & measures, designing interactive reports, and publishing the reports to Power BI Services.
Project information
- Category: Business Intelligence Report
- Client: Maven Market
- Report Published at: Power BI Services
- Github Repo: Power-BI-Projects
PART 1: Connecting & Transforming the Data
1) I am connecting Customers.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:
- I've added a new column named "full_name" to merge the the "first_name" and "last_name" columns, separated by a space
- Created a new column named "birth_year" to extract the year from the "birthdate" column, and format as text
- Created a conditional column named "has_children" which equals "N" if "total_children" = 0, otherwise "Y"
- Also updated datatypes of all columns to text and whole number as per their nature.
After data transformation through Power Query, the data will look like this 👇
2) I am connecting Products.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:
- Added a calculated column named "discount_price", equal to 90% of the original retail price
- Replaced "null" values with zeros in both the "recyclable" and "low-fat" columns
After data transformation through Power Query, the data will look like this 👇
3) I am connecting Stores.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:
- Added a calculated column named "full_address", by merging "store_city", "store_state", and "store_country", separated by a comma and space
- Added a calculated column named "area_code", by extracting the characters before the dash ("-") in the "store_phone" field
After data transformation through Power Query, the data will look like this 👇
4) I am connecting Calendar.csv file in Power BI through Get Data feature. Also, doing some Data Cleansing as below:
- Adding add date columns this will help in time intelligence calculations:
a) Start of Week
b) Name of Day
c) Start of Month
d) Name of Month
e) Quarter of Year
f) Year
After data transformation through Power Query, the data will look like this 👇
5) I am connecting a folder in Power BI through Get Data feature to add Transactions_1997 and Transactions_1998. Also, doing some Data Cleansing as below:
- Connected to the folder path, and choose "Edit" (vs. Combine and Edit)
- Removed spaces and split the date columns by delimiters then merge in correct date format.
After data transformation through Power Query, the data will look like this 👇
I have connected all tables in Power BI, now our data is ready for Data Analysis and Data Visualization. Here's all tables:
Now let's move to Data Modelling and connect all related table with each other.
PART 2: Creating the Data Model
Using the report I created in Part 1, I will complete the following steps:
- Connected Transactions to Customers, Products, and Stores using valid primary/foreign keys
- Connected Transactions to Calendar using both date fields
- Connected Returns to Products, Calendar, and Stores using valid primary/foreign keys
- Connected Stores to Regions. This is snowflake schema as dimension table having other dimension table.
All tables are now connected with each other, now I'm adding some calculated columns & DAX measures and getting it ready for Data Analysis.
PART 3: Adding DAX Measures & Calculated Columns
Using the report I created in Part 2, now I'm adding all calculated column and measures with the help of DAX :
- In the Calendar table, add a column named "Weekend" - Equals "Y" for Saturdays or Sundays (otherwise "N")
Weekend = IF(OR('Calendar'[Day Name]=="Saturday",'Calendar'[Day Name]=="Sunday"),"Y","N")
End of Month = ENDOFMONTH('Calendar'[date].[Date])
Current Age = DATEDIFF(Customers[birthdate],TODAY(),YEAR)
Priority = IF(AND(Customers[homeowner]=="Y",Customers[member_card]=="Golden"),"High","Standard")
Short_Country = upper(LEFT(Customers[Country/Region],3))
House Number = LEFT(Customers[customer_address], SEARCH(" ", Customers[customer_address]))
Price_Tier = IF(Products[product_retail_price]>3, "High", IF(Products[product_retail_price]>1, "Mid", "Low"))
Years_Since_Remodel = DATEDIFF(Stores[last_remodel_date],TODAY(), YEAR)
Weekend Transactions = CALCULATE([Total Transactions], 'Calendar'[Weekend] = "Y")
All Transactions = CALCULATE([Total Transactions], ALL(Transactions_Data))
All Return = CALCULATE([Total Returns], ALL(Return_Data))
Total Revenue = SUMX(Transactions_Data,Transactions_Data[quantity] * RELATED(Products[product_retail_price]))
Total Cost = SUMX(Transactions_Data, Transactions_Data[quantity] * RELATED(Products[product_cost]))
Total Profit = Transactions_Data[Total Revenue]-Transactions_Data[Total Cost]
Profit Margin = DIVIDE(Transactions_Data[Total Profit],Transactions_Data[Total Revenue])
Unique Products = DISTINCTCOUNT(Products[product_name])
Last Month Transactions = CALCULATE(Transactions[Total Transactions], DATEADD('Calendar'[Date],-1,MONTH))
Last Month Revenue = CALCULATE(Transactions[Total Revenue], DATEADD('Calendar'[Date],-1,MONTH))
Last Month Profit = CALCULATE(Transactions[Total Profit], DATEADD('Calendar'[Date],-1,MONTH))
PART 4: Building the Report
Now its time to build interactive dashboards so that end business users can extract valuable insights for business. Here's my sample dashboards:
PART 5: Publishing the Report to Power BI Services and Subscribed to report: