Polybase Data Loading

Meagan Longoria explains that loading data using Polybase can be finicky:

First let me say that PolyBase is cool. I can query data in text files and join to tables in my database. Next let me say PolyBase is a fairly young technology and has some limitations that I imagine will be improved in later versions.

One of those limitations (as of July 30, 2016) is that while you can declare your field delimiter and a string delimiter in external file formats, the row delimiter is not user configurable and there is no way to escape or ignore the row delimiter characters (\r, \n, or \r\n) inside of a string. So if you have a string that contains the row delimiter, PolyBase will interpret it as the end of the row even if it is placed inside of the string delimiters.

This is definitely something to keep in mind.  I haven’t dealt with data with newlines within attributes, so I haven’t run into this yet, but don’t let it bite you.

Related Posts

Connecting PolyBase to Spark

I have a blog post connecting PolyBase to a Spark cluster: If you do define your Spark DataFrames well, you get a much happier result. Here’s me creating a better-looking DataFrame in Spark: import org.apache.spark.sql.functions._ spark.sql(""" SELECT INT(SUMLEV) AS SummaryLevel, INT(COUNTY) AS CountyID, INT(PLACE) AS PlaceID, BOOLEAN(PRIMGEO_FLAG) AS IsPrimaryGeography, NAME AS Name, POPTYPE AS PopulationType, […]

Read More

PolyBase on Linux

I have a post showing how to set up PolyBase on Linux: Now that we have SQL Server on Linux installed, we can begin to install PolyBase. There are some instructions here but because we started with the Docker image, we’ll need to do a little bit of prep work. Let’s get our shell on. First, run docker […]

Read More

Categories

August 2016
MTWTFSS
« Jul Sep »
1234567
891011121314
15161718192021
22232425262728
293031