The projects integrates much more nicely with TFS than the older “U-SQL Project” does.
It actually gives you the icons (padlock, check mark, etc..) in the solution explorer, so it actually looks like it’s under source control!
Something that I’d really hoped had been fixed, but hasn’t, is when copying and renaming an existing item, it doesn’t recognize the rename. You have to undo the checkout of the non-existent object (the copy, before being renamed):
Read on for more improvements.
When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining the variable in U-SQL.
Supported types include:
byte, sbyte, int, uint (or uint32), long, ulong (or uint64), float, double, decimal, short (or int16), ushort (or uint16), char, string, DateTime, bool, Guid, or byte
Read on for an example of the process.
Each time you submit a U-SQL job, a job folder is created in your Azure Data Lake Store account. This folder contains useful debugging information about the job, including a file called the U-SQL algebra file. This is an XML file containing information about your job graph, the list of input and output files, and other key U-SQL job metadata.
We’ve just published a sample script that reads the U-SQL algebra file for a specified job and returns the input or output files. Give it a try!
Read on for more.
Now let’s go to the new thing – undocumented usage of inline functions. My function is pretty simple and I can imagine that function you will write could be as simple as mine but your functions will probably do something more useful. To simplyfy the coding process you could use inline function in your USQL script and not have to write any code in the C# file.
The could could look like this:
Things which make languages look more like functional languages generally get a thumbs up from me.
The concept of a database is used to group related data structures and functions together. ADLA users have databases in their production environment that contain tables, assemblies, table valued functions and other objects. Previously, when developing and tuning U-SQL queries on a local machine, developers would have to manually recreate everything in their production database. After coding they would have to identify any changes to the database and then update the production account’s database. This process took extra time and introduced errors without adding any value.
Using the Export Wizard, developers can clone the existing database environment and sample data directly to the local account. Developers can also choose to export only parts of the database to the local database. Follow below steps to export your U-SQL databases.
Click through for the step-by-step process.
This post is a continuation of the blog where I discussed using U-SQL to standardize JSON input files which vary in format from file to file, into a consistent standardized CSV format that’s easier to work with downstream. Now let’s talk about how to make this happen on a schedule with Azure Data Factory (ADF).
This was all done with Version 1 of ADF. I have not tested this yet with the ADF V2 Preview which was just released.
It’s a bit lengthy, but Melissa lays it out step-by-step, making it straightforward to follow.
Handling the varying formats in U-SQL involves a few steps if it’s the first time you’ve done this:
Upload custom JSON assemblies [one time setup]
Create a database [one time setup]
Register custom JSON assemblies [one time setup]
Upload JSON file to Azure Data Lake Store [manual step as an example–usually automated]
Run U-SQL script to “standardize” the JSON file(s) into a consistent CSV column/row format
Melissa then shows us how to do this step-by-step.
It seemed some of the rows in my CSV files exceeded an upper limit on how much the Extractor.Csv function can handle and adding the silent:true parameter didn’t solve the issue.
I dug a bit deeper and found rows in some of the files that are long – really long. One in particular was 47MB long just for the row and this was valid data. I could have manually edited these outs by hand but thought I’d see if I could solve another way.
After some internet research and a couple of helpful tweets to and from Michael Rys, I decided to have a go at making my own custom U-SQL extractor.
Phillip has included the custom extractor code, so if you find yourself needing to parse very large rows of data in U-SQL, you’ll definitely be interested in this.
Many common scenarios for U-SQL developers require constructing a RowSet made up of a simple range of numbers or dates, for example the integers from 1 to 10. In this blog post we’ll take a look at options for doing this in U-SQL. In the process, we’ll get a chance to learn how to use some common U-SQL features:
Creating RowSets from constant values
Using CROSS JOIN
Using SELECT to map integers to DateTimes
Using CREATE TABLE to create a table directly from a RowSet. This is sometimes called “CREATE TABLE AS SELECT” and often abbreviated as “CTAS“.
Click through to learn more.
One of U-SQL’s core capabilities is to be able to schematize unstructured data on the fly without having to create a metadata object for it. This capability is provided by the EXTRACT expression that will invoke either a user-defined extractor or built-in extractor to process the input file or set of files specified in the FROM clause and produces a rowset whose schema is specified in the EXTRACT clause.
While using the build-in extractor to schema semi-structured data, like data in .csv file, the schema definition in U-SQL is slow and error prone, especially for the .csv file contains hundreds of columns.
Recently, we released a new feature in the latest version of Azure Data Lake Tools for Visual Studio to help you generate this U-SQL EXTRACT statement automatically.
Click through for an example as well as a video showing the process.