Introduction Airflow
This lesson introduces you to Apache Airflow.
We’ll cover the following
The difference between Apache Airflow and other workflow management systems
Example
Apache Airflow is an open-source workflow management platform. It started at Airbnb in October 2014 as a solution to manage the company’s increasingly complex workflows
Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface
Airflow can be described as a platform for defining, executing, and monitoring workflows. A workflow can be defined as any sequence of steps taken to accomplish a particular goal. Imagine that at your company, there’s a job that copies log data from machines and uploads it to an S3 bucket.
A second MapReduce job reads log data from that S3 bucket and computes any anomalies detected (e.g., too many logins) and writes them out to an HDFS location. And finally, a third job reads the output of the second job and inserts it into a relational database. Such pipelines are a common occurrence at enterprises. One of the challenges faced by growing Big Data teams and their use cases has been the ability to stitch together related jobs into an end-to-end workflow. The tool of choice for describing workflows before Airflow existed was Oozie, but it came with its own limitations. Gradually over the years, Airflow has overtaken Oozie in popularity for creating complex workflows.
Pipeline example
The difference between Apache Airflow and other workflow management systems
Here are some of the differences between Airflow and other Big Data workflow management platforms, such as Oozie:
DAGs (Directed Acyclic Graph) are written in Python language, which has a lower learning curve and is more widely used by less technically savvy folks compared to Java, which is used by Oozie.
Airflow has a huge community contributing to it, which is why it is easy to find integration solutions for every major service/cloud provider.
Airflow is more versatile, expressive, and capable of creating extremely complex workflows. Airflow provides advanced metrics on workflows.
Airflow’s API is richer, and the UI has been deemed better than most other workflow management systems.
One of the key differentiating features of Airflow is the use of templating to replace variables or expressions with values when a template gets rendered. This allows for use cases such as referencing a unique filename that corresponds to the date of the DAG run. Jinja is the Python template engine used by Airflow to provide pipeline authors with a set of built-in parameters and macros.
As of this writing, managed Airflow cloud services have also been introduced, such as Google Composer and Astronomer.io.
Example
Airflow works with Python, which has contributed to its vast acceptance, as Python is a widely practiced language. Lets work with an example below to demonstrate the simplicity and flexibility with which pipelines and workflows can be set up using Airflow.
Your app can be found at: https://ed-5114987936743424-live-app....
Introduction Airflow
Airflow
Schedule
Смотрите видео Introduction Airflow онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Cloudvala 12 Февраль 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 4 раз и оно понравилось людям.