Handling Rogue Queries In Spark

Alicja Luszczak, et al, introduce the Query Watchdog:

The previous query would cause problems on many different systems, regardless of whether you’re using Databricks or another data warehousing tool. Luckily, as an user of Databricks, this customer has a feature available that can help solve this problem called the Query Watchdog.

Note: Query Watchdog is available on clusters created with version 2.1-db3 and greater.

A Query Watchdog is a simple process that checks whether or not a given query is creating too many output rows for the number of input rows at a task level. We can set a property to control this and in this example we will use a ratio of 1000 (which is the default).

It looks like this is an all-or-nothing process, but a very interesting start.

Related Posts

Database Migrations With DBATools

Jason Squires has a series on database migrations using dbatools.  The first part covers capacity planning: Great results, I now know that I cannot move the database from one server to another on any of the drives shown without expanding the drive volume.  Even though this was an awesome output, I wanted more. I know […]

Read More

Data Wrangling At Scale

John Mount has a short article showing off the cdata package: Suppose we needed to un-pivot this data into a row oriented representation. Often big data transform steps can achieve a much higher degree of parallelization with “tall data”. With the cdata package this transform is easy and performant, as we show below. Read the whole thing.

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930