Word Cloud Visual

Devin Knight shows off the word cloud custom visual in Power BI:

Key Takeaways

  • Great for parsing unstructured data

  • Utilize stop words to remove commonly used filler words like a, the, an, etc…

    • You can use the default stop word that are provided and add your own that you would like to remove from the visual.
  • The size of the words in the visual tell you how frequently the word is used.

Cf. yesterday’s word cloud example.  I’m not sure how truly valuable word clouds are for visualization purposes, but at least they’re fun to peruse.

Nonclustered Columnstore Indexes On Indexed Views

Niko Neugebauer notes that non-clustered columnstore indexes can now sit on top of indexed views, as of SQL Server 2016:

From the perspective of the disk access, this is where you will definitely win at least a couple of times with the amount of the disk access while processing the information, amount of memory that you will need to store and process (think hashing and sorting for the late materialisation phases), and you will pay less for the occupied storage.

Another noticeable thing was that the memory grants for the Indexed Views query was smaller compared to the query that was processing the original columnstore table FactOnlineSales.

Clustered indexes are currently not available as an option; we’ll see if that changes in the next version of SQL Server.

Auditing In Power BI

Ginger Grant shows off some of the auditing capabilities within Power BI:

As you can see by looking at the available Power BI options, there are a number of options to choose from. If you select the top item PowerBI activities, then everything gets selected. After doing that click outside of the menu for the menu to go away. Select a date and time range of your choosing, select specific users if you wish, then click on the Search button. Depending on how big your date range is, this may take some time to load. Once you see the results, you have the ability to filter as well.

Another day, another two dozen new Power BI features…  This one’s a good one.

Word Clouds In Python

Allison Tharp shows how to generate a word cloud using Python:

Every week, someone on Reddit posts a “word cloud” on all of the NFL team’s subreddits.  These word clouds show the most used words on that subreddit for the week (the larger the word, the more it was used).  These word plots are always really fascinating to me, so I wanted to try to make some for myself.  In this tutorial, we’ll be making the following word cloud from my board game stats twitter feed, @BGGStats

Looks like the implementation is fairly straightforward, so check it out.

Azure Data Lake Updates

Saveen Reddy points out a few updates to Azure Data Lake Store & the Azure Data Lake Analytics portal:

Use Custom Delimeters when Previewing Files

Previously, we had supported comma, colon, space, tab, ampersand, and bar delimiters. With the many different kinds of files used in Azure Data Lake Store and Azure Storage, we’ve added a “Custom” delimiter options for you to define your own delimiter.

To change the delimiter on the Azure Portal:

  1. Open the file you want to preview using Data Explorer.

  2. Click on Format

  3. Under Delimiter, click the dropdown and change it to Custom

  4. A new Custom Delimiter field will appear, type in your delimiter here

  5. Click OK

Read on for more updates.

Change With Automatic Stats Update

Jack Li notes that SQL Server 2016 has changed when automatic statistics update gets called:

Old threshold: it takes 20% of row changes before auto update stats kicks (there are some tweaks for small tables, for large tables, 20% change is needed).  For a table with 100 million rows, it requires 20 million row change for auto stats to kick in. For vast majority of large tables, auto stats basically doesn’t do much.

New threshold: Starting SQL 2008 R2 SP1, we introduced a trace flag 2371 to control auto update statistics better (new threshold).  Under trace flag 2371, percentage of changes requires is dramatically reduced with large tables.  In other words, trace flag 2371 can cause more frequent update.  This new threshold is off by default and is enabled by the trace flag.  But in SQL 2016, this new threshold is enabled by default for a database with compatibility level 130.

Important to know.

Generating JSON In Power BI

Chris Webb shows how to generate JSON in M:

Often, when calling web services from Power BI or Power Query, you’ll need to generate some JSON inside your query to send to these web services. The M language makes this relatively easy to do with the Json.FromValue() function but there’s only one example of how to use it in the help so I though it might be useful to provide some worked examples of M data types and how Json.FromValue() turns them into JSON.

First, here’s a function – that I’ve called GetJson() for the examples here – that takes a parameter of any data type, passes it to Json.FromValue() and returns the JSON representation of the input as text:

Read on for the code sample.

T-SQL And R Performance Comparisons

Kevin Feasel



Tomaz Kastrun does several performance comparisons between various R packages and T-SQL constructs:

Couple of packages I will mention for data manipulations are plyr, dplyr and data.table and compare the execution time, simplicity and ease of writing with general T-SQL code and RevoScaleR package. For this blog post I will use R packagedplyr and T-SQL with possibilites of RevoScaleR computation functions.

My initial query will be. Available in WideWorldImportersDW database. No other alterations have been done to underlying tables (fact.sale or dimension.city).

Read on for code and conclusions.  I don’t think there are any shocking conclusions:  the upshot is to filter data as early as possible.

Always Encrypted Powershell Cmdlets

Sanjay Mishra alerts us to new Powershell cmdlets for enabling Always Encrypted on columns:

The July 2016 release of SSMS (and later versions) introduced a set of PowerShell cmdlets through a new ‘SqlServer’ module. This pagedescribes the various capabilities that these cmdlets bring to the table. Of most interest to the specific scenario described above is the Set-SqlColumnEncryption cmdlet. In the post below, we will walk through the steps required to use this – first from a PowerShell session to test the capability, and then finally from a C# application which is using PowerShell Automation to invoke the cmdlets from an application.

As a side note it is worth knowing that the cmdlets in the ‘SqlServer’ PowerShell module can also be used for automating key setup and management (and are, in many ways, more powerful than SSMS – they expose more granular tasks, and thus can be used to achieve role separation and to develop a custom key management workflow – but that is likely a topic for a separate post!)

Sanjay also includes a sample Powershell script to show how it works.

Starting Azure Stream Analytics Jobs From Code

Kevin Feasel



Hylke Peek wants to kick off an Azure Stream Analytics job from a Universal Windows Platform application:

I had one of those feelings while working with Azure Stream Analytics (ASA). My solution worked but there was one ‘elementary and simple’ thing I wanted: Start the ASA-jobs within my C#-code. That shouldn’t be hard and there’s some documentation. But no, I needed to combine several opposed solutions to a new one to make it possible.

In this post I shortly explain how you can start ASA-jobs within your C# UWP application:

  • I explain which components you have in the authentication process and which parameters you need.

  • Example code is provided. You only need to enter your parameter values.

Click through for the code.


January 2019
« Dec