Devin Knight continues his Power BI visualization course:

In this module you will learn how to use the Histogram, a Power BI Custom Visual.  A Histogram is a column chart which shows the distribution of occurrences divided into categories, called bins.  This type of chart is useful for estimating density and discovering outliers.

Another fine entry in a great series.  Check it out.

Query Folding

Devin Knight discusses query folding and the View Native Query feature inside Power Query:

The idea behind Query Folding is to push the logic that you built into a Power BI query back to the data source server and execute it there in it’s native language instead of doing a client side transform of the data.  Why is this important?  Let me give you an example.  Say you have a 2 billion row SQL Server table you need to connect to in Power BI, but you want to filter to only return the last year of data.  With Query Folding the filter of that data is done on the SQL Server side instead of on the client side. If Query folding did not take place than that would mean all 2 billion rows would be brought across the network only to then filtered out on the client workstation.  So clearly the ideal situation is that all your queries get folded for the best possible performance, but Query Folding only works in certain scenarios.

I hadn’t heard the term “query folding” before, but the concept makes sense; in the PolyBase world, it’s “predicate pushdown.”  Check out Devin’s post, as he shows how easy it is to see to what extent your query is running client-side versus server-side.

Macroeconomic Charts

Riddhiman shows how to use R and plotly to build charts of Federal Reserve data sets:

In this post we’ll try to replicate some of the charts created by the Federal Reserve which visualize some well known macroeconomic indicators. We’ll also showcase the new Plotly 4.0 syntax.

This is a very code-heavy blog post and is a good way to learn about plotly.

DDL Events

Dave Mason looks at DDL triggers:

DDL triggers and the events they handle are run within the same transaction, which can be rolled back. This is a powerful and convenient feature. It gives you the ability to programmatically “undo” undesirable events. Let’s look at a variation of the previous DROP_TABLE trigger we created. This script will create a temporary logging table, drop the previous trigger, and create it again. This time, the trigger will roll back the transaction, preventing the table from being dropped. Then it inserts a message to the log table.

There are a few gotchas here, to be sure, but DDL triggers are powerful tools.

Power BI June Update

Dustin Ryan talks about this month’s batch of Power BI updates:

Searchable slicers are also a new feature in the latest release of Power BI Desktop. A couple days ago I wrote about some of my favorite custom visuals, which included the Smart Filter by SQLBI. I think I still prefer the Smart Filter in many situations, but the search-ability of the native Slicer is definitely a nice feature to have right out of the box.

The headline is row-level security, but there are several interesting features here.

Stretch Databases

Kevin Feasel



Tim Radney looks at Stretch Database functionality in RTM:

Prior to learning about this new billing method for DSU, I could make the argument that using Stretch Database would be a very cost effective method for storing cold data (unused data) into the cloud. By stretching this data into Azure, you could migrate a large portion of older data, which would decrease the size (and thus cost) of your local backups. In the event you had to restore a database, you would simply have to establish the connection to Azure for the stretched data, thus eliminating the need to restore it. However, with the minimal cost being nearly $1,000 per month for the low end DSU scale, many organizations will find that it is much cheaper to retain the data on a less expensive tier of storage within their data center and find other methods for HA such as mirroring, log shipping, or Availability Groups.

Read the whole thing.  Maybe V2 of stretch databases will fix some of the biggest problems (the cost, needing to pull all of your data back down before you make any schema changes, etc.) and become a viable feature, but I can’t see it being one today.

HDInsight Tool For Eclipse

Xiaoyong Zhu reports that the HDInight tool for Eclipse is now generally available:

The HDInsight Tool for Eclipse extends Eclipse to allow you to create and develop HDInsight Spark applications and easily submit Spark jobs to Microsoft Azure HDInsight Spark clusters using the Eclipse development environment.  It integrates seamlessly with Azure, enabling you to easily navigate HDInsight Spark clusters and to view associated Azure storage accounts. To further boost productivity, the HDInsight tool for Eclipse also offers the capability to view Spark job history and display detailed job logs.

Check out the link for videos and additional resources.

Query Store Bug

Brent Ozar reports on a bug in Query Store in Standard and Express editions:

If you’re using SQL Server 2016’s awesome new feature, Query Store, there’s a new bug with automatic cleanup. Books Online explains:

Automatic data cleanup fails on editions other than Enterprise and Developer. Consequently, space used by the Query Store will grow over time until configured limit is reached, if data is not purged manually. If not mitigated, this issue will also fill up disk space allocated for the error logs, as every attempt to execute cleanup will produce a dump file.

Here’s hoping that bug gets fixed quickly.

sp_help And Column Lengths

Jon Morisi looks at an oddity with the way that sp_help shows column length:

I was working with a contractor today who was having difficulty providing me back details on a table definition.  I was specifically interested in a particular column’s data type and size.  (This was related to an ETL process I was working on, and my desire to avoid any implicit conversions).

The reply I got back was, “the column you’re interested in is an nvarchar(100)”.  After continued digging and troubleshooting, I was eventually able to sort out that it was actually an nvarchar(50).

I put together this TEST table to illustrate where the confusion came from.  Can you spot what’s going on?

There’s an interesting explanation which makes me dislike sp_help just a little bit more.

Running Compiled Code In Azure ML

Max Kaznady shows how to use R or Python scripts to call compiled code within Azure ML:

In this post, we focus on sourcing R and Python’s external dependencies, such as R libraries and Python modules, which are not already installed on Azure ML and require code compilation. Commonly the compiled code comes from a variety of other languages such as C, C++ and Fortran. One could also use this approach to wrap their compiled code with R or Python wrappers and run it on Azure ML.

To illustrate the process, we will build two MurmurHash modules from C++ for R and Python using the following two implementations on GitHub, and link them to Azure ML from a zipped folder

Link via David Smith.  I knew it was possible to call compiled C code from Python and R, but didn’t expect to be able to do it within Azure ML, so that’s good to know.


February 2019
« Jan