Data quality for big datasets

Published: 03 July 2024
on channel: Data Science Festival
10,336
1

A talk by Akshay Dineshkumar Jain from Innovate UK.

The talk will cover automated data quality checks performed by large organisations to execute data reliability checks on big datasets in real time using data profiling and machine learning techniques. The demo will use the open source library Deequ, Spark framework and reporting & notifications tools to enforce data issues in a proactive manner. I will be covering an example of a framework I have developed at Amazon and Visa to validate customer facing data and its integration with notification tools based on the statistical methods.

Technical Level: Technical practitioner

This session was part of the Data Science Festival MayDay event 2024. Find out more at https://datasciencefestival.com/event...

The Data Science Festival is the place for data-driven people to come together, share cutting-edge ideas, and solve real-world problems. We run monthly events, meet-ups, and the biggest free-to-attend data festivals in the UK. Join the community at https://datasciencefestival.com/


Watch video Data quality for big datasets online without registration, duration hours minute second in high quality. This video was added by user Data Science Festival 03 July 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 10,33 once and liked it people.