Brew That Data!
What is ezbrew?¶
ezbrew is a simple to use, interactive product that enables all data users to build end to end data pipelines anytime, anywhere using their preferred choice of open-source data tools.Through ezbrew, the process of dealing with multiple tools and the challenges of the modern data stack gets simplified significantly.
By using our single interactive interface and a library of user-friendly ez-Recipes and ez-Standards, you can now easily create end-to-end data pipelines in your local development environment using your preferred open-source data tools. And once you've perfected the data pipeline, deploying it to your cloud platform is going to be a breeze with ez-Deploy.
This reduces the amount of time developers spend on data setup and increases their productivity, allowing them to focus on building the pipelines and insights that matter & iterate quickly around the business use cases.
Why ezbrew?¶
ezbrew aims to provide a ton of benefits for a data pipeline or data product developer, be it a data engineer, an analytics engineer, a data scientist or even a software engineer building data products. Following are some of the main benefits of using ezbrew:
Reduces the Time to start building data pipelines
ezbrew provides a choice of open-source data tools to build out the data platform.
All the heavy lifting of installing these tools individually or together is done by ezbrew. Users select the tools they would like to evaluate or build their data pipelines with, and bring up their local development environment in a matter of minutes than days or weeks or months of installing and integrating them to work together.
The easy installs on the local development machines helps to quickly prototype and evaluate what would work best for the needs of the data platform for the ecosystem or even for a specific data product.
Building around this ecosystem with a standard language, of sorts, also means that you can easily switch between tools when the need arises i.e. making the migration path easy with minimal to none code changes required.
For GA launch, we aim to provide an option of 2 tools in each category i.e. ingest, transform, quality, orchestrate and visualize.
Boosts Developer Productivity
The main benefits that ezbrew brings in to boost developer productivity are
- Provide faster feedback loops during pipeline development
- Avoids noisy neighbor problems during unit testing the pipelines or validating the data
- Built-in transformation standards
Having a local development environment brings a lot of benefits to the user. Pipelines can be evaluated as they are built without deploying to the test cloud environments, thus providing faster feedback loops to the developer.
This would also mean, that the user can unit test the pipeline and validate the data by executing the pipeline from the local development machine and avoid stepping on the toes of their team mates to test out the same pipeline at the same time.
Lastly, the built-in tranformation standards, also known as ez-Standards, can help standardize transformation patterns across the data ecosystem and bring down the required development time. Some patterns that we plan to tackle are Generalizing data like timestamps, PII Handling using masking or hashing techniques, etc.
Easily deploy and maintain data pipelines/products
ezbrew not only provides a local development environment to build and test, it also provides IaC templates to build out the data platform on your preferred choice of cloud provider (Google Cloud Platform is the cloud provider selected for the GA launch, with other cloud providers coming eventually).
ezbrew also provides a git-enabled project repository during the project-initiation and will include workflow templates to manage continuous integration/continuous deployment through major cloud-based Git providers like GitHub, BitBucket, GitLab (with GitHub being the tool of choice for GA launch, and others getting onboarded eventually).
Pipeline Notifications and runtime metadata collection are 2 additional aspects that will help the user maintain their data pipelines and sleep with ease knowing they will get notified when things go wrong.
- For pipeline notifications, integrations with communication platforms like slack and teams (slack being the tool of choice for GA launch), with incident management platforms like opsgenie and pagerduty (opsgenie being the tool of choice for GA launch)
- Runtime metadata collection provides integration with monitoring tools like prometheus or signalfx (prometheus being the tool of choice for GA launch)
Increased confidence & trust
Building fast and easy is definitely good, but of no use if the data product is itself not trust-worthy. The goal here is that leveraging ezbrew's dataset manager, data integrity and quality checks can be auto-generated, thus providing a higher level of confidence in the data assets being built through ezbrew. These checks can also be enhanced or extended by the developer easily.
With the aspects of data observability, the developer and the data stakeholders can have insights into the health of the infrastructure and the pipeline and get notified as soon as something does not seem right. Eventually, integrating with data discovery tools for catalog and lineage would not only help with educating the data stakeholders of the available data assets, but would also help the developer to quickly identify the downstream impact of the failures and efficiently communicate across the board.