External File Formats

I look at file formats in Polybase:

Delimited text is exactly as it sounds:  you can use a comma, tab, pipe, tilde, or any other delimiter (including multi-character delimiters).  So let’s go through the options here.  First, FORMAT_TYPE must be DELIMITEDTEXT.  From there, we have a few FORMAT_OPTIONS.  I mentioned FIELD_TERMINATOR, which is how we separate the values in a record.  We can also use STRING_DELIMITER if there are quotes or other markers around our string values.

DATE_FORMAT makes it easier for Polybase to understand how dates are formatted in your file.  The MSDN document gives you hints on how to use specific date formats, but you can’t define a custom format today, or even use multiple date formats.

It feels like there’s a new Hadoop file format every day.

Related Posts

What’s New In Ambari 2.7

Paul Codding and Kat Petre share some of the new features in Ambari 2.7: With this release, we wanted to make Ambari more enjoyable to use every day, simplify finding and using our API, and unblock teams managing very large clusters.  Here is a preview of a few features we’re excited to share with you:Revamped […]

Read More

Working With Images In Spark 2.4

Tomas Nykodym and Weichen Xu give us an update on working with images in the most recent version of Apache Spark: An image data source addresses many of these problems by providing the standard representation you can code against and abstracts from the details of a particular image representation.Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930