Tar And Polybase

I look at what the deal is with Polybase and Tar files:

The select statement returned 3104 records, exactly 4 shy of the 3108 I would have expected (777 * 4 = 3108).  In each case, the missing row was the first, meaning when I search for LastName = ‘Turgeon’ (the first player in my data set), I get zero rows.  When I search for another second basemen in the set, I get back four rows, exactly as I would have expected.

What’s really interesting is the result I get back from Wireshark when I run a query without pushdown:  it does actually return the row for Casey Turgeon.

This isn’t an ideal scenario, but it did seem to be consistent in my limited testing.

Related Posts

MRAppMaster Errors Running MapReduce Jobs

I have a post looking at potential causes when PolyBase MapReduce jobs are unable to find the MRAppMaster class: Let me tell you about one of my least favorite things I like to see in PolyBase: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster This error is not limited to PolyBase but is instead […]

Read More

Database-First or Kafka-First for Event Streaming

Gwen Shapiro takes us through a scenario where database-first writes for event streaming makes the most sense: Note that the DB does quite a lot for you: it enforces serializability, locks, your logical constraints, etc. If the DB is distributed (Vitesse, Cockroach, Spanner, Yugabyte), it does even more. If you were to go Kafka-first… well, […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930