Comparing Data Lake Job Runs

Yanan Cai shows how to compare stats on different executions of a job:

Troubleshooting issues in recurring job is a time-consuming task. It starts with searching through the Job Browser to find instances of a recurring job and identifying both baseline and anomalous performance. This is followed by multi-way comparisons between job instances to figure out what has been changed in the query, data or environment. This is followed by analysis to discover which changes may have performance impact. While this is happening production workloads continue to under-perform or go down.

Azure Data Lake Tools for Visual Studio now makes it easy to spot anomalies and quickly trace the key characteristics across recurring job instances allowing for an efficient debugging experience. The Pipeline Browser automatically groups recurring jobs to simplify discovery of all runs. The Related Job View collects data about inputs, outputs and execution across multiple runs into a single visualization.

Read on for more.

Related Posts

It’s All ETL (Or ELT) In The End

Robin Moffatt notes that ETL (and ELT) doesn’t go away in a streaming world: In the past we used ETL techniques purely within the data-warehousing and analytic space. But, if one considers why and what ETL is doing, it is actually a lot more applicable as a broader concept. Extract: Data is available from a source system Transform: We […]

Read More

Switching To Managed Disks In Azure

Chris Seferlis walks us through an easy method to convert unmanaged disks to managed disks in Azure: First off, why would you want a managed disk over an unmanaged one? Greater scalability due to much higher IOPs and storage limits. There’s no longer the need to add additional storage accounts when you’re adding disk space, […]

Read More

Categories

January 2018
MTWTFSS
« Dec Feb »
1234567
891011121314
15161718192021
22232425262728
293031