Polybase Statistics

I dig into a non-trivial Polybase query:

Polybase offers the ability to create statistics on tables, the same way that you would on normal tables.  There are a few rules about statistics:

  1. Stats are not auto-created.  You need to create all statistics manually.

  2. Stats are not auto-updated.  You will need to update all statistics manually, and currently, the only way you can do that is to drop and re-create the stats.

  3. When you create statistics, SQL Server pulls the data into a temp table, so if you have a billion-row table, you’d better have the tempdb space to pull that off.  To mitigate this, you can run stats on a sample of the data.

Round one did not end on a high note, so we’ll see what round two has to offer.

Related Posts

Running Hive LLAP As A YARN Service

Gour Saha, et al, demonstrate running Apache Hive LLAP as a YARN service: Making LLAP as a first-class YARN service also enables us to use some of the other powerful features in YARN that were added in Apache Hadoop 3.0 / 3.1, some of them are noted below. Advanced container placement scheduling such as affinity […]

Read More

Flattening JSON Data With Databricks

Ivan Vazharov gives us a Databricks notebook to parse and flatten JSON using PySpark: With Databricks you get: An easy way to infer the JSON schema and avoid creating it manually Subtle changes in the JSON schema won’t break things The ability to explode nested lists into rows in a very easy way (see the […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930