U-SQL Movie Recommender

Kevin Feasel



Dave Ballantyne introduces us to U-SQL via a movie recommender:

What follows is an overview of my experiments that i have published into a GitHib repo. The “Examples” folder are what i would term “simple learnings” and “Full Scripts” are scripts that to a lesser or greater extent do something “useful”.  Im also not suggesting that anything here is “best practice” or method A performs better than method B,  I simply do not have the required size of data to make that call. My aim was to learn the language.

TLDR: Check out the script MovieLens09-CosineSimilarityFromCSVWithMax.usql for a U-SQL movie recommender.

U-SQL was introduced last year, but word of mouth about the language has been quite limited to date.  I’ll be interested in seeing what other examples pop up over the next few months.

Invalid Perfmon Calculations

Paul Popovich notes that certain Perfmon counters could be wrong on certain versions of Windows:

Performance Monitor uses incorrect calculation for certain types of counters in Windows 8, Windows Server 2012, Windows 7 SP1, or Windows Server 2008 R2 SP1

This only cost us a week of reviewing results.

Follow up on the link because there’s a fix available through Windows Update.

Local Aggregation

Niko Neugebauer investigates a new line in the Columnstore Index Scan execution plan tooltip, Actual Number of Locally Aggregated Rows:

There is a new line in the properties of the iterator, showing the number of locally aggregated rows and that number equals 619255, that should be exactly the number of rows that is missing from the arrow connecting 2 iterators:

Gives us our perfect 12627608 rows.
Is there any more information on this operation?
Indeed, just right-click on the Columnstore Index Scan and select it’s properties:

This is tied to some columnstore performance improvements in SQL Server 2016.

Reorganize Columnstore Indexes

I have a new script available to reorganize columnstore indexes:

Note that this script requires SQL Server 2016 (or later) because the database engine team made some great changes to columnstore indexes, allowing us to use REORGANIZE to clear out deleted rows and compact row groups together, as well as its previous job of marking open delta stores as available for compression.

The code is available as a Gist for now, at least until I decide what to do with it.  Comments are welcome, especially if I’m missing a major reorganize condition.

As mentioned, comments are welcome.

Generating Fixed-Width Files With Power Query

Chris Webb shows how to generate fixed-width files using Power Query inside Excel:

While it’s fairly common to need to load fixed-width files using Power Query or Power Query (and there’s a nice walkthrough of how to do this here), occasionally you might want to use Power Query and Excel to create a fixed-width output for another system, or maybe to create some test data. You might not want to do it often but I can imagine that when/if Power Query is integrated into SSIS this will be a slightly less obscure requirement; at the very least, this post should show you how to use a couple of M functions that are under-documented.

I don’t see this being a particularly common request, but I guess I can see some scenario in which we’re loading data into a legacy system.

Trusting Foreign Keys

Jefferson Elias describes the concept of trusted foreign keys as well as their analog:

Specifying WITH CHECK in a statement tells to SQL Server the user wants it to validate the constraint against every single row in the table, then, if successful, enable it.

In contrast, specifying WITH NOCHECK, which is the default for an existing constraint, means that the constraint is enabled but no validation has been made on it. Even if this mode is faster to run, it can lead to severe side effects on performance: SQL Server doesn’t trust the constraint as it has not validated it. We refer to such a foreign key as an « untrusted foreign key ». As a consequence, the query optimizer won’t use the constraint to do his job…

There are benefits to having trusted foreign key constraints.  Check out the article for more details as well as how to fix this issue.


March 2016
« Feb Apr »