Polybase Statistics

I dig into a non-trivial Polybase query:

Polybase offers the ability to create statistics on tables, the same way that you would on normal tables.  There are a few rules about statistics:

  1. Stats are not auto-created.  You need to create all statistics manually.

  2. Stats are not auto-updated.  You will need to update all statistics manually, and currently, the only way you can do that is to drop and re-create the stats.

  3. When you create statistics, SQL Server pulls the data into a temp table, so if you have a billion-row table, you’d better have the tempdb space to pull that off.  To mitigate this, you can run stats on a sample of the data.

Round one did not end on a high note, so we’ll see what round two has to offer.

Related Posts

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

ElasticMapReduce And RStudio

Tanzir Musabbir demonstrates how to set up Amazon ElasticMapReduce to include an RStudio edge node: RStudio Server provides a browser-based interface for R and a popular tool among data scientists. Data scientist use Apache Spark cluster running on  Amazon EMR to perform distributed training. In a previous blog post, the author showed how you can install RStudio Server on Amazon […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930