Tahir Fayyaz goes from Delta Lake to Power BI:
To get the optimal performance from Power BI it is recommended to use a star schema data model and to make use of user-defined aggregated tables. However, as you build out your facts, dimensions, and aggregation tables and views in Delta Lake, ready to be used by the Power BI data model, it can become complicated to manage all the pipelines, dependencies, and data quality as you need to consider the following:
– How to easily develop and manage the data model’s transformation code.
– How to run and scale data pipelines for the model as data volumes grow.
– How to keep all the Delta Lake tables updated as new data arrives.
– How to view the lineage for all tables as the model gets more complex.
– How to actively stop data quality issues that result in incorrect reports.
Read on for recommendations, a couple architectural diagrams, and some sample code.
Comments closed