Using Fabric notebooks (pySpark) to clean and transform real-world JSON data

Published: 16 August 2023
on channel: Learn Microsoft Fabric with Will
5,970
128

10+ hours of FREE Fabric Training: https://www.skool.com/microsoft-fabri...

#dataengineering #python #microsoftfabric

In this video, I make use of Microsoft Fabric's data engineering experience - specifically the Synapse Data Engineering Notebooks (pySpark engine) to read a JSON file in our Lakehouse Files area, parse the JSON structure, clean the data a bit, transform some of the columns and then LOAD the data into a nice Lakehouse Table.

This video follows on from my last video, in which I extracted data from an external weather API, using Data Pipelines in Microsoft Fabric, then loaded the raw JSON into Lakehouse Files.    • Extract and Load from External API to...  

Next parts of the series include:
Data validation of pySpark dataframes in Fabric using Great Expectations
Visualising the weather data in Power BI.

-LINK TO OPENWEATHERMAP-
Here's the API used in this tutorial: https://openweathermap.org/api

-OTHER VIDEOS YOU MIGHT LIKE-
LAKEHOUSE -    • Microsoft Fabric Lakehouse Tutorial  
Data warehouse -    • Microsoft Fabric Data Warehouse Expla...  
Data pipelines -    • Three data pipeline use cases to make...  
End-to-end data flows project -    • Dataflows end-to-end project (Microso...  
OneLake -    • OneLake - the FIRST thing you need to...  

-TIMELINE-
0:00 Intro
1:09 Recap on the Lakehouse Files
1:40 Intro to notebook structure
2:12 Reading JSON into pySpark dataframe
4:07 Exploring the data in Azure Storage Explorer & VS Code
5:28 Parsing the JSON
8:52 Datetime conversion
9:55 Calculated columns
11:25 Rounding numbers
12:37 Code refactoring
15:Load dataframe to Lakehouse table
-LINKEDIN-
Not following the LinkedIn page yet? Here's the link:   / learnmicrosoftfabric  

-ABOUT WILL-
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.

-SUBSCRIBE-
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.


Watch video Using Fabric notebooks (pySpark) to clean and transform real-world JSON data online without registration, duration hours minute second in high quality. This video was added by user Learn Microsoft Fabric with Will 16 August 2023, don't forget to share it with your friends and acquaintances, it has been viewed on our site 5,97 once and liked it 12 people.