Polybase Statistics

I dig into a non-trivial Polybase query:

Polybase offers the ability to create statistics on tables, the same way that you would on normal tables.  There are a few rules about statistics:

  1. Stats are not auto-created.  You need to create all statistics manually.

  2. Stats are not auto-updated.  You will need to update all statistics manually, and currently, the only way you can do that is to drop and re-create the stats.

  3. When you create statistics, SQL Server pulls the data into a temp table, so if you have a billion-row table, you’d better have the tempdb space to pull that off.  To mitigate this, you can run stats on a sample of the data.

Round one did not end on a high note, so we’ll see what round two has to offer.

Related Posts

Testing Kafka Streams Applications

Yeva Byzek continues her series on testing Kafka-based streaming applications: When you create a stream processing application with Kafka’s Streams API, you create a Topologyeither using the StreamsBuilder DSL or the low-level Processor API. Normally, the topology runs with the KafkaStreams class, which connects to a Kafka cluster and begins processing when you call start(). For testing though, connecting to a running […]

Read More

Auto ML With SQL Server 2019 Big Data Clusters

Marco Inchiosa has a model scenario for using Big Data Clusters to scale out a machine learning problem: H2O provides popular open source software for data science and machine learning on big data, including Apache SparkTM integration. It provides two open source python AutoML classes: h2o.automl.H2OAutoML and pysparkling.ml.H2OAutoML. Both APIs use the same underlying algorithm implementations, […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930