The use of databricks for data engineering or data analytics workloads is becoming more prevalent as the platform grows, and has made its way into most of our recent modern data architecture proposals – whether that be PaaS warehouses, or data science platforms.
To run any type of workload on the platform, you will need to setup a cluster to do the processing for you. While the Azure-based platform has made this relatively simple for development purposes, i.e. give it a name, select a runtime, select the type of VMs you want and away you go – for production workloads, a bit more thought needs to go into the configuration/cost. In the following blog I’ll start by looking at the pricing in a bit more detail which will aim to provide a cost element to the cluster configuration process.
There are a few complicating factors in figuring out cluster price but rest assured that it will be costly.