Reusing U-SQL Scripts

Kevin Feasel

2017-12-20

U-SQL

Matthew Hicks shows how to use Powershell to parameterize U-SQL scripts:

You can use this feature either via Azure Cloud Shell or on a Windows machine with Azure PowerShell installed.

When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining the variable in U-SQL.

Supported types include:

byte, sbyte, int, uint (or uint32), long, ulong (or uint64), float, double, decimal, short (or int16), ushort (or uint16), char, string, DateTime, bool, Guid, or byte[]

Read on for an example of the process.

Fetching U-SQL Job Input And Output Paths

Matthew Hicks shows how to retrieve information on U-SQL input and output paths using Powershell:

Each time you submit a U-SQL job, a job folder is created in your Azure Data Lake Store account. This folder contains useful debugging information about the job, including a file called the U-SQL algebra file. This is an XML file containing information about your job graph, the list of input and output files, and other key U-SQL job metadata.

We’ve just published a sample script that reads the U-SQL algebra file for a specified job and returns the input or output files. Give it a try!

Read on for more.

Inline U-SQL Functions

Kevin Feasel

2017-10-23

U-SQL

Damien Widera shows us how to write inline functions in U-SQL:

Now let’s go to the new thing – undocumented usage of inline functions. My function is pretty simple and I can imagine that function you will write could be as simple as mine but your functions will probably do something more useful. To simplyfy the coding process you could use inline function in your USQL script and not have to write any code in the C# file.

The could could look like this:

1
2
3
4
DECLARE @in string = "/Data/Aircraft/2006ByMonth/{*}.tsv";
DECLARE @out string = "/Data/Aircraft/2006ByMonth/out/CSharpFunction.tsv";
DECLARE @func Func<int,int> = (s)=>{return s+1;};

Things which make languages look more like functional languages generally get a thumbs up from me.

Copying Azure Data Lake Databases

Kevin Feasel

2017-10-23

U-SQL

Yanan Cai shows how to copy Azure Data Lake databases for local debugging and development:

The concept of a database is used to group related data structures and functions together. ADLA users have databases in their production environment that contain tables, assemblies, table valued functions and other objects. Previously, when developing and tuning U-SQL queries on a local machine, developers would have to manually recreate everything in their production database. After coding they would have to identify any changes to the database and then update the production account’s database. This process took extra time and introduced errors without adding any value.

Using the Export Wizard, developers can clone the existing database environment and sample data directly to the local account. Developers can also choose to export only parts of the database to the local database. Follow below steps to export your U-SQL databases.

Click through for the step-by-step process.

Scheduled U-SQL Jobs With Azure Data Factory

Melissa Coates shows how to schedule Azure Data Factor workflows to run U-SQL:

This post is a continuation of the blog where I discussed using U-SQL to standardize JSON input files which vary in format from file to file, into a consistent standardized CSV format that’s easier to work with downstream. Now let’s talk about how to make this happen on a schedule with Azure Data Factory (ADF).

This was all done with Version 1 of ADF. I have not tested this yet with the ADF V2 Preview which was just released.

It’s a bit lengthy, but Melissa lays it out step-by-step, making it straightforward to follow.

Multi-Structured Data In U-SQL

Kevin Feasel

2017-09-06

JSON, U-SQL

Melissa Coates shows us how to use U-SQL to normalize JSON files in which different rows may have differing structures:

Handling the varying formats in U-SQL involves a few steps if it’s the first time you’ve done this:

  1. Upload custom JSON assemblies  [one time setup]

  2. Create a database   [one time setup]

  3. Register custom JSON assemblies   [one time setup]

  4. Upload JSON file to Azure Data Lake Store [manual step as an example–usually automated]

  5. Run U-SQL script to “standardize” the JSON file(s) into a consistent CSV column/row format

Melissa then shows us how to do this step-by-step.

Gigantic Row Custom U-SQL Extractor

Kevin Feasel

2017-09-04

U-SQL

Phillip Seamark has created a custom U-SQL extractor which handles rows larger than 4 MB:

It seemed some of the rows in my CSV files exceeded an upper limit on how much the Extractor.Csv function can handle and adding the silent:true  parameter didn’t solve the issue.

I dug a bit deeper and found rows in some of the files that are long –  really long.  One in particular was 47MB long just for the row and this was valid data.  I could have manually edited these outs by hand but thought I’d see if I could solve another way.

After some internet research and a couple of helpful tweets to and from Michael Rys, I decided to have a go at making my own custom U-SQL extractor.

Phillip has included the custom extractor code, so if you find yourself needing to parse very large rows of data in U-SQL, you’ll definitely be interested in this.

Building A Tally Table With U-SQL

Kevin Feasel

2017-08-23

U-SQL

Saveen Reddy shows how to create a tally (or numbers) table in U-SQL:

Many common scenarios for U-SQL developers require constructing a RowSet made up of a simple range of numbers or dates, for example the integers from 1 to 10. In this blog post we’ll take a look at options for doing this in U-SQL. In the process, we’ll get a chance to learn how to use some common U-SQL features:

  • Creating RowSets from constant values

  • Using CROSS JOIN

  • Using SELECT to map integers to DateTimes

  • Using CREATE TABLE to create a table directly from a RowSet. This is sometimes called “CREATE TABLE AS SELECT” and often abbreviated as “CTAS“.

Click through to learn more.

Generating U-SQL Extract Scripts From Visual Studio

Yanan Cai shows a GUI for creating U-SQL EXTRACT scripts via Azure Data Lake Tools for Visual Studio:

One of U-SQL’s core capabilities is to be able to schematize unstructured data on the fly without having to create a metadata object for it. This capability is provided by the EXTRACT expression that will invoke either a user-defined extractor or built-in extractor to process the input file or set of files specified in the FROM clause and produces a rowset whose schema is specified in the EXTRACT clause.

While using the build-in extractor to schema semi-structured data, like data in .csv file, the schema definition in U-SQL is slow and error prone, especially for the .csv file contains hundreds of columns.

Recently, we released a new feature in the latest version of Azure Data Lake Tools for Visual Studio to help you generate this U-SQL EXTRACT statement automatically.

Click through for an example as well as a video showing the process.

Row Headers In U-SQL

Kevin Feasel

2017-07-28

U-SQL

Melissa Coates shows how to handle row headers in CSV files when writing U-SQL queries:

This is a quick tip about syntax for handling row headers in U-SQL, the data processing language of Azure Data Lake Analytics. There are two components: handling row headers on the source data which is being queried, and row headers on the dataset being generated by ADLA.

Click through for the one-liners as well as sample queries.

Categories

February 2018
MTWTFSS
« Jan  
 1234
567891011
12131415161718
19202122232425
262728