Introduction to Data Science with Python – Preprocessing Dirty Data with Pandas

Опубликовано: 16 Май 2021
на канале: fiiinspires
581
14

This tutorial is about how to efficiently use Pandas (a data manipulation and analysis library built on top of Python Programming Language) for data preprocessing. We will look at how to think through and implement data cleaning tasks. We will formulate hypothesis about the data and justify why the formulation holds. Important concepts in data preprocessing will be discussed:
Checking the data types of fields
Dealing with missing values
Split a single column into multiple independent fields
Remove irrelevant columns

Improve your data preprocessing skills so that you can quickly get to the insight in your data. Making your analysis error-free and insight-rich depend on how well you pre-process the data.

Some Pandas Methods (Functions) Discussed
pd.read_csv – read delimited data files as Pandas DataFrame
pd.datetime – convert string formatted as date as Pandas DateTime64
str.strip – remove extra whitespaces/character from begin and end of a string
str.extract – extract substring as Pandas Series
str.rstrip – strip off rightmost character
str.title – convert string to title case
str.split – split string as Pandas Series based on a delimiter
str.replace – replace a substring with another
str.count – count how many times a substring appears in a string
str.find – find the position of character substring
str.contains – decide whether or not a string contains a substring

Regular Expression Techniques
captured group
last character ($ dollar sign)
lookbehind assertion
work character (backslash lower case w)
space character (backslash lower case s)
question mark quantifier (match zero or one time)
asterisks quantifier (match zero or more times)

Python Function
dir – to get a glimpse of the objects in a module

Timestamp
00:00 Intro
01:21 Jupyter and Import Pandas Library
02:18 Read Data into Pandas DataFrame
04:18 Count and Find Characters in a String
09:44 Split Column using Colon as Delimiter
12:16 Data Familiarization
13:30 Multiple Steps to Extract Substring – Regular Expression
16:50 Single Step to Extract Substring – Regular Expression
18:31 Split Column into Multiple Fields
21:50 Data Cleaning
25:10 Conclusion

Download
https://drive.google.com/file/d/1hUPh...


Смотрите видео Introduction to Data Science with Python – Preprocessing Dirty Data with Pandas онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь fiiinspires 16 Май 2021, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 581 раз и оно понравилось 14 людям.