Using AU Analyzer To Lower Data Lake Analytics Costs

Matthew Hicks shows off the Data Lake Analytics AU Analyzer:

The AU Analyzer looks at all the vertices (or nodes) in your job, analyzes how long they ran and their dependencies, then models how long the job might run if a certain number of vertices could run at the same time. Each vertex may have to wait for input or for its spot in line to run. The AU Analyzer isn’t 100% accurate, but it provides general guidance to help you choose the right number of AUs for your job.

You’ll notice that there are diminishing returns when assigning more AUs, mainly because of input dependencies and the running times of the vertices themselves. So, a job with 10,000 total vertices likely won’t be able to use 10,000 AUs at once, since some will have to wait for input or for dependent vertices to complete.

In the graph below, here’s what the modeler might produce, when considering the different options. Notice that when the job is assigned 1427 AUs, assigning more won’t reduce the running time. 1427 is the “peak” number of AUs that can be assigned.

I like this kind of tooling, as it provides a realistic assessment of tradeoffs.

Related Posts

Azure SQL Database SLAs

Arun Sirpal ponders the Azure SQL Database service level agreement: Lets just get straight to the point, Azure SQL Database across all service tiers gives you the customer a SLA of 99.99% up-time. This means potential unavailability periods shown below. Good, bad, you decide. The point is that even in the cloud we “could” potentially encounter downtime. […]

Read More

Getting Started With Azure Databricks

David Peter Hansen has a quick walkthrough of Azure Databricks: RUN MACHINE LEARNING JOBS ON A SINGLE NODE A Databricks cluster has one driver node and one or more worker nodes. The Databricks runtime includes common used Python libraries, such as scikit-learn. However, they do not distribute their algorithms. Running a ML job only on the driver might not […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031