External File Formats

I look at file formats in Polybase:

Delimited text is exactly as it sounds:  you can use a comma, tab, pipe, tilde, or any other delimiter (including multi-character delimiters).  So let’s go through the options here.  First, FORMAT_TYPE must be DELIMITEDTEXT.  From there, we have a few FORMAT_OPTIONS.  I mentioned FIELD_TERMINATOR, which is how we separate the values in a record.  We can also use STRING_DELIMITER if there are quotes or other markers around our string values.

DATE_FORMAT makes it easier for Polybase to understand how dates are formatted in your file.  The MSDN document gives you hints on how to use specific date formats, but you can’t define a custom format today, or even use multiple date formats.

It feels like there’s a new Hadoop file format every day.

Related Posts

Replicating Data In HDFS Between Clusters

Murali Ramasami and Niru Anisetti have an article showing how to use the Hortonworks Data Lifecycle Manager to set up replication between two Hadoop clusters: Data Lifecycle Manager (DLM) delivers on the promise of location-agnostic, secure replication by encapsulating and copying data seamlessly across physical private storage and public cloud environments. This empowers businesses to […]

Read More

Installing Confluent Platform On Windows

Niels Berglund shows how to install Confluent Platform (the Confluent branded version of Apache Kafka) on a Windows machine using the Windows Subsystem for Linux: WSL is primarily aimed at developers, and it allows you to run Linux environments directly on Windows in a native format and without the overhead of a virtual machine. Let us […]

Read More


November 2016
« Oct Dec »