Real-World Dataset Cleaning with Python Pandas! (Olympic Athletes Dataset)

Published: 01 January 1970
on channel: Keith Galli
30,985
1.2k

I'm prepping a dataset for an upcoming tutorial and I figured walking through the process of cleaning it would work well for a livestream! We use various Python Pandas functions to accomplish our data cleaning goals.

We'll be working off of this repo:
https://github.com/KeithGalli/Olympic...

Some topics that we cover:
How you can use web scraping to collect data like this (Python beautifulsoup).
Splitting strings into separate columns
Using regular expressions (regexes) to extract specific details from columns
Converting columns to datetime & numeric types
Grabbing only a subset of our columns

Sorry that this was a bit last minute scheduling-wise, will try to give more advance notice in the future!

Video timeline!
0:00 - Livestream Overview
4:00 - About the Olympics dataset (source website and how it was scraped)
9:50 - Cleaning the dataset (getting started with code & data)
19:26 - What aspects of our data should be cleaned?
29:08 - Get rid of bullet points in Used name column
34:08 - How to split Measurements into two separate height/weight numeric columns.
1:05:00 - Parse out dates from Born & Died columns
1:25:43 - Parse out city, region, and country from Born column (working with regular expressions)
1:41:15 - Get rid of the extra columns
1:46:08 - Next steps (how would we clean the results.csv)
1:49:41 - Questions & Answers


-------------------------
Follow me on social media!
Instagram |   / keithgalli  
Twitter |   / keithgalli  
TikTok |   / keithgalli  

-------------------------
Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith

Join the Python Army to get access to perks!
YouTube -    / @keithgalli  
Patreon -   / keithgalli  

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.


Watch video Real-World Dataset Cleaning with Python Pandas! (Olympic Athletes Dataset) online without registration, duration hours minute second in high quality. This video was added by user Keith Galli 01 January 1970, don't forget to share it with your friends and acquaintances, it has been viewed on our site 30,98 once and liked it 1.2 thousand people.