From telemetry data to CSVs with Python, Spark and Azure Databricks
[EuroPython 2021 - Talk - 2021-07-28 - Parrot [Data Science]]
[Online]
By Nicolò Giso
Tenova is an engineering company working alongside client-partners to design and develop innovative technologies and services that improve their business, creating solutions that help metals and mining companies to reduce costs, save energy, limit environmental impact and improve working conditions for their employees.
In the context of Industry 4.0, Tenova provides each equipment with a field gateway, named Tenova Edge, to collect telemetry data, perform edge analytics with AI models and send data to the Tenova Platform (hosted on Microsoft Azure) for further elaborations.
To develop analytics solutions, data scientists and process engineers need the data in a manageable format.
Furthermore, continuous retraining of AI models is necessary to guarantee high performances and reliable results.
For all of these reasons, we needed to implement an ETL solution to transform the raw data in formats ready for analysis and retraining. In particular, the key requirement was to convert the JSON Lines files coming from the field in CSV files ready to be used.
The CSV files have to satisfy the following conditions:
each file contains the data for a device
only one file for device per day
each file has a midnight row containing for each cell the value recorded at midnight or the last value of the previous day (SPOILER: here it’s where the fun happens!)
For this purpose, we have implemented a series of Databricks Notebooks, run daily by Azure DataFactory, that leveraging Pyspark and Pandas manipulates the raw JsonLines files in nicely formatted CSVs.
License: This video is licensed under the CC BY-NC-SA 4.0 license: https://creativecommons.org/licenses/...
Please see our speaker release agreement for details: https://ep2021.europython.eu/events/s...
Смотрите видео Nicolò Giso - From telemetry data to CSVs with Python, Spark and Azure Databricks онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь EuroPython Conference 27 Сентябрь 2021, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 451 раз и оно понравилось 4 людям.