In one of my recent projects we wanted to visualize data from the customers analytical platform based on Azure Databricks in Power BI. The connection between those two tools works pretty flawless which I also described in my previous post but the challenge was the use-case and the calculations. We wanted to display the distinct customers across various aggregations levels over a billion rows fact table. We came up with different potential solutions all having their pros and cons:
1. load all data into Power BI (import mode) and do the aggregations there
2. use Power BI with direct query and let the back-end do the heavy lifting
3. load only necessary pre-aggregated data into Power BI (import mode)
Please keep in mind that we are dealing with a distinct count measure here. Semi- and Non-additive measure like this cannot easily be aggregated from lower levels to higher levels without having all the detail data available!
Read on for Gerhard’s thoughts on each as well as the decision and process.