We'll build a data pipeline that can download and store podcast episodes using Apache Airflow, a powerful and widely used data engineering tool. This is a beginner tutorial, so we'll start off by installing Airflow and covering key Airflow concepts.
Along the way, we'll learn how to create our first data pipeline (DAG) in Airflow, how to write tasks using Operators and the TaskFlow API, how to interface with databases using Hooks, and how to run the pipeline efficiently.
By the end of the tutorial, you'll have a good understanding of how to use Airflow, as well as a project that you can extend and build on. Some extensions to this project include automatically transcribing the podcasts and summarizing them.
You can find the full code for the project here, along with an overview - https://github.com/dataquestio/projec... .
Chapters
00:00 Introduction
01:44 - Installing Airflow
07:17 - Creating the first task in our data pipeline with Airflow
17:11 - Using a SQL database with Airflow
25:30 - Storing data in a SQL database with Airflow
34:36 - Downloading podcast episodes with Airflow
38:17 - Looking at our complete data pipeline and next steps
---------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: https://bit.ly/3O8MDef
Смотрите видео Build An Airflow Data Pipeline To Download Podcasts [Beginner Data Engineer Tutorial] онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Dataquest 06 Июнь 2022, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 33,98 раз и оно понравилось 88 людям.