External File Formats

I look at file formats in Polybase:

Delimited text is exactly as it sounds:  you can use a comma, tab, pipe, tilde, or any other delimiter (including multi-character delimiters).  So let’s go through the options here.  First, FORMAT_TYPE must be DELIMITEDTEXT.  From there, we have a few FORMAT_OPTIONS.  I mentioned FIELD_TERMINATOR, which is how we separate the values in a record.  We can also use STRING_DELIMITER if there are quotes or other markers around our string values.

DATE_FORMAT makes it easier for Polybase to understand how dates are formatted in your file.  The MSDN document gives you hints on how to use specific date formats, but you can’t define a custom format today, or even use multiple date formats.

It feels like there’s a new Hadoop file format every day.

Related Posts

Cassandra To Kafka Connect

Mike Barlotta shows how to feed data into Kafka from Cassandra via Kafka Connect.  Part one involves basic setup: Modeling data in Cassandra must be done around the queries that are needed to access the data (see this article for details). Typically this means that there will be one table for each query and data (in our […]

Read More

Use Cases For Apache Kafka

Amy Boyle shows a few scenarios where New Relic uses Apache Kafka: The Events Pipeline team is responsible for plumbing some of New Relic’s core data streams-specifically, event data. These are fine-grained nuggets of monitoring data that record a single event at a particular moment in time. For example, an event could be an error thrown […]

Read More


November 2016
« Oct Dec »