Gigantic Row Custom U-SQL Extractor

Kevin Feasel

2017-09-04

U-SQL

Phillip Seamark has created a custom U-SQL extractor which handles rows larger than 4 MB:

It seemed some of the rows in my CSV files exceeded an upper limit on how much the Extractor.Csv function can handle and adding the silent:true  parameter didn’t solve the issue.

I dug a bit deeper and found rows in some of the files that are long –  really long.  One in particular was 47MB long just for the row and this was valid data.  I could have manually edited these outs by hand but thought I’d see if I could solve another way.

After some internet research and a couple of helpful tweets to and from Michael Rys, I decided to have a go at making my own custom U-SQL extractor.

Phillip has included the custom extractor code, so if you find yourself needing to parse very large rows of data in U-SQL, you’ll definitely be interested in this.

Related Posts

Scheduled U-SQL Jobs With Azure Data Factory

Melissa Coates shows how to schedule Azure Data Factor workflows to run U-SQL: This post is a continuation of the blog where I discussed using U-SQL to standardize JSON input files which vary in format from file to file, into a consistent standardized CSV format that’s easier to work with downstream. Now let’s talk about how to […]

Read More

Multi-Structured Data In U-SQL

Kevin Feasel

2017-09-06

JSON, U-SQL

Melissa Coates shows us how to use U-SQL to normalize JSON files in which different rows may have differing structures: Handling the varying formats in U-SQL involves a few steps if it’s the first time you’ve done this: Upload custom JSON assemblies  [one time setup] Create a database   [one time setup] Register custom JSON assemblies […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930