It seemed some of the rows in my CSV files exceeded an upper limit on how much the Extractor.Csv function can handle and adding the silent:true parameter didn’t solve the issue.
I dug a bit deeper and found rows in some of the files that are long – really long. One in particular was 47MB long just for the row and this was valid data. I could have manually edited these outs by hand but thought I’d see if I could solve another way.
After some internet research and a couple of helpful tweets to and from Michael Rys, I decided to have a go at making my own custom U-SQL extractor.
Phillip has included the custom extractor code, so if you find yourself needing to parse very large rows of data in U-SQL, you’ll definitely be interested in this.