Data Morph: A Cautionary Tale of Summary Statistics — Stefanie Molin

Published: 02 September 2024
on channel: Kiwi PyCon
91
6

Recorded at Kiwi PyCon 2024 - https://kiwipycon.nz/

Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution.

To illustrate this fact, researchers have generated many datasets that are very different visually, but share the same summary statistics. In this talk, I will discuss Data Morph, an open source package that builds on previous research using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. I will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.

This talk introduces Data Morph, a new open source Python package that can be used to morph an input dataset of 2D points into select shapes, while preserving the summary statistics to a given number of decimal points through simulated annealing. Data Morph extends research from Autodesk to create the Datasaurus Dozen, and is intended to be used as a teaching tool for illustrating why you can’t rely solely on summary statistics. Come learn how it works and what it takes to translate research into an open-source library.

Stefanie Molin is a software engineer at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas: A Python data science handbook for data collection, wrangling, analysis, and visualization,” which is currently in its second edition and has been translated into Korean and Chinese. She holds a bachelor’s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

This video is licensed under CC BY-SA 4.0 - https://creativecommons.org/licenses/...

Recorded and produced by KIT LTD :   / kitltdnz  


Watch video Data Morph: A Cautionary Tale of Summary Statistics — Stefanie Molin online without registration, duration hours minute second in high quality. This video was added by user Kiwi PyCon 02 September 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 9 once and liked it people.