Tar And Polybase

I look at what the deal is with Polybase and Tar files:

The select statement returned 3104 records, exactly 4 shy of the 3108 I would have expected (777 * 4 = 3108).  In each case, the missing row was the first, meaning when I search for LastName = ‘Turgeon’ (the first player in my data set), I get zero rows.  When I search for another second basemen in the set, I get back four rows, exactly as I would have expected.

What’s really interesting is the result I get back from Wireshark when I run a query without pushdown:  it does actually return the row for Casey Turgeon.

This isn’t an ideal scenario, but it did seem to be consistent in my limited testing.

Related Posts

Kafka Streams And Time-Based Batching

Vladimir Vajda provides a warning for people using Kafka Streams: To completely understand the problem, we will first go into detail how ingestion and processing occur by default in Kafka Streams. For example purposes, the punctuate method is configured to occur every ten seconds, and in the input stream, we have exactly one message per second. The purpose of the job […]

Read More

Kafka And GDPR

Ben Stopford has some ideas for using Kafka in a GDPR world: The simplest way to remove messages from Kafka is to simply let them expire. By default, Kafka will keep data for two weeks, and you can tune this to arbitrarily large periods of time as required. There is also an Admin API that […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930