Components¶
Component Types¶
ezbrew can be broken into the following components:
- Project
- Environment
- Pipeline
- Task
Project¶
An ezbrew project is the top level component. This basically houses all the other components of an ezbrew project like environments, pipelines, and tasks.
An ezbrew project can be set up in multiple ways. A project could be:
- the entire data pipeline ecosystem for a company
- example: 1 project for the whole company
- it could be a data pipeline repository for individual teams
- example: 1 project for each business unit within the company or 1 project for each team
- it could also be a data pipeline for a data product
- example: 1 project for building the data product that tracks the active users of a suite of products for a company
Environment¶
An ezbrew environment is the environment in which a project is deployed.
Environment is where the tools can be chosen to use for a project. To start with, this will be the local k8s cluster environment where the open-source tools are installed for building and testing data pipelines locally. This environment will then also be used to deploy the infrastructure on the cloud.
Pipeline¶
A pipeline in ezbrew is similar to that of a data pipeline i.e. a group of tasks that build out data assets within a data lake, data warehouse or a data lakehouse.
This can be thought of as a DAG in Airflow, a flow in Prefect or a job in Dagster.
This is where we define the schedule, the task dependency graph, the notification channels, etc. for a given data pipeline or a data product.
Task¶
A task is the smallest component in ezbrew, it's basically a unit of work performed as part of a data pipeline.
This is where data can be ingested into a data lake, a data asset can be transformed in a data lakehouse, data can be pushed into an external system, metadata can be published to an external system, a data quality check could be triggered on a data asset to identify the integrity of the object, etc.
Component Hierarchy¶
Following diagram depicts the hierarchy of the components within ezbrew:
- Project is the top-level component
- An Environment is linked to a Project
- A Project can have multiple Pipelines
- A Pipeline can have multiple Tasks