Gigantic Row Custom U-SQL Extractor

Kevin Feasel

2017-09-04

U-SQL

Phillip Seamark has created a custom U-SQL extractor which handles rows larger than 4 MB:

It seemed some of the rows in my CSV files exceeded an upper limit on how much the Extractor.Csv function can handle and adding the silent:true  parameter didn’t solve the issue.

I dug a bit deeper and found rows in some of the files that are long –  really long.  One in particular was 47MB long just for the row and this was valid data.  I could have manually edited these outs by hand but thought I’d see if I could solve another way.

After some internet research and a couple of helpful tweets to and from Michael Rys, I decided to have a go at making my own custom U-SQL extractor.

Phillip has included the custom extractor code, so if you find yourself needing to parse very large rows of data in U-SQL, you’ll definitely be interested in this.

Related Posts

Overview: U-SQL Database Projects

Zach Stagers gives us an overview of the new U-SQL Database Project structure: Source Control The projects integrates much more nicely with TFS than the older “U-SQL Project” does. It actually gives you the icons (padlock, check mark, etc..) in the solution explorer, so it actually looks like it’s under source control! Something that I’d really hoped […]

Read More

Reusing U-SQL Scripts

Kevin Feasel

2017-12-20

U-SQL

Matthew Hicks shows how to use Powershell to parameterize U-SQL scripts: You can use this feature either via Azure Cloud Shell or on a Windows machine with Azure PowerShell installed. When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930