Using Fabric notebooks (pySpark) to clean and transform real-world JSON data

Опубликовано: 16 Август 2023
на канале: Learn Microsoft Fabric with Will
5,970
128

10+ hours of FREE Fabric Training: https://www.skool.com/microsoft-fabri...

#dataengineering #python #microsoftfabric

In this video, I make use of Microsoft Fabric's data engineering experience - specifically the Synapse Data Engineering Notebooks (pySpark engine) to read a JSON file in our Lakehouse Files area, parse the JSON structure, clean the data a bit, transform some of the columns and then LOAD the data into a nice Lakehouse Table.

This video follows on from my last video, in which I extracted data from an external weather API, using Data Pipelines in Microsoft Fabric, then loaded the raw JSON into Lakehouse Files.    • Extract and Load from External API to...  

Next parts of the series include:
Data validation of pySpark dataframes in Fabric using Great Expectations
Visualising the weather data in Power BI.

-LINK TO OPENWEATHERMAP-
Here's the API used in this tutorial: https://openweathermap.org/api

-OTHER VIDEOS YOU MIGHT LIKE-
LAKEHOUSE -    • Microsoft Fabric Lakehouse Tutorial  
Data warehouse -    • Microsoft Fabric Data Warehouse Expla...  
Data pipelines -    • Three data pipeline use cases to make...  
End-to-end data flows project -    • Dataflows end-to-end project (Microso...  
OneLake -    • OneLake - the FIRST thing you need to...  

-TIMELINE-
0:00 Intro
1:09 Recap on the Lakehouse Files
1:40 Intro to notebook structure
2:12 Reading JSON into pySpark dataframe
4:07 Exploring the data in Azure Storage Explorer & VS Code
5:28 Parsing the JSON
8:52 Datetime conversion
9:55 Calculated columns
11:25 Rounding numbers
12:37 Code refactoring
15:Load dataframe to Lakehouse table
-LINKEDIN-
Not following the LinkedIn page yet? Here's the link:   / learnmicrosoftfabric  

-ABOUT WILL-
Hi, I'm Will! I'm hugely passionate about data and using it to create a better world. I currently work as a Consultant, focusing on Data Strategy, Data Engineering and Business Intelligence (within the Microsoft/Azure/Fabric environment). I have previously worked as a Data Scientist. I started Learn Microsoft Fabric to share my learnings on how Microsoft Fabric works and help you build your career and build meaningful things in Fabric.

-SUBSCRIBE-
Not subscribed yet? You should! There are lots of new videos in the pipeline covering all aspects of Microsoft Fabric.


Смотрите видео Using Fabric notebooks (pySpark) to clean and transform real-world JSON data онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Learn Microsoft Fabric with Will 16 Август 2023, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 5,97 раз и оно понравилось 12 людям.