AI-Assisted Feature Selection for Big Data Modeling

Published: 21 July 2020
on channel: Databricks
814
12

The number of features going into models is growing at an exponential rate thanks to the power of Spark. So is the number of models each company is creating. The common approach is to throw as many features as you can into a model. Features that don’t improve the model can easily end up hurting it by increasing model complexity, reducing accuracy and making the model hard for users to understand. However, since it takes a lot of manual effort to find the noisy features and to remove them, most teams either don’t do it or do it sparingly. We have developed an AI assisted way to identify which features improve the accuracy of a model and by how much. In addition, we present a sorted list of features with an estimate of what accuracy (e.g., r2) improvement is expected by their inclusion.

There are some existing methods to handle the automated feature selection, almost all of which are computationally expensive and not translatable to big data applications. In this work, we introduce a fast feature selection algorithm that automatically drops the less relevant input features, while preserving and in some cases enhancing the model accuracy. The method starts by automated feature relevance ranking based on bootstrapped model training. This ranking determines the order of feature elimination which is much more efficient than randomized feature elimination. There are other simplifying assumptions during this feature selection, as well as our distributed implementation of the process that enable fast parallelized feature selection on medical big data.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...

Connect with us:
Website: https://databricks.com
Facebook:   / databricksinc  
Twitter:   / databricks  
LinkedIn:   / databricks  
Instagram:   / databricksinc   Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-nam...


Watch video AI-Assisted Feature Selection for Big Data Modeling online without registration, duration hours minute second in high quality. This video was added by user Databricks 21 July 2020, don't forget to share it with your friends and acquaintances, it has been viewed on our site 814 once and liked it 12 people.