Press "Enter" to skip to content

Curated SQL Posts

Updated Power BI Violin Chart

Meagan Longoria reviews an update to Daniel Marsh-Patrick’s violin chart custom visual in Power BI:

First, the violin plot is now a certified custom visual. This means that it has been tested by the Power BI team to ensure it meets certain requirements, one of which is that the visual does not access external services or resources. You can be confident your data isn’t being sent externally when you use the violin plot.

As for the functional enhancements, a new legend has been added. This is a great addition to make the chart clearer and more easily read, especially for audiences that may not be familiar with how the violin plot works. The customizable legend calls out what markers are used for mean, median, and quartiles.

Meagan is quite pleased with these updates.

Comments closed

Explaining Tree-Based Algorithms

Stephanie Glen takes us through quick explanations of decision trees, random forests, and gradient boosting:

The three methods are similar, with a significant amount of overlap. In a nutshell:

– A decision tree is a simple, decision making-diagram.
Random forests are a large number of trees, combined (using averages or “majority rules”) at the end of the process.
Gradient boosting machines also combine decision trees, but start the combining process at the beginning, instead of at the end.

Read on for more details. All three are useful algorithms serving similar but slightly different purposes.

Comments closed

Query Store Required Permissions

Andreas Wolter takes us through exactly which permissions are required for Query Store to work:

Typically, there are 3 aspects to the work with the Query Store, which can be reflected in roles:
 
1) Configuration – turning Query Store on and off, clearing the contents, flushing its contents to disk and changing its settings.
2) Viewing the reports or using the DMVs to analyze the Query Store contents (queries, plans and wait statistics) to gain insights but not necessarily having the authority to change anything
3) Actively change Plans by forcing or un-forcing, based on the information obtained from (2)

This is a nice overview of the problem and a fair amount of the solution.

Comments closed

Kubernetes on Windows

Elton Stoneman helps us get started with Kubernetes on Windows boxes:

Now you can take older .NET Framework apps and run them in Kubernetes, which is going to help you move them to the cloud and modernize the architecture. You start by running your old monolithic app in a Windows container, then you gradually break features out and run them in .NET Core on Linux containers.

Organizations have been taking that approach with Docker Swarm for a few years now. I cover it in my book Docker on Windows and in my Docker Windows Workshop. It’s a very successful way to do migrations – breaking up monoliths to get the benefits of cloud-native architecture, without a full-on rewrite project.

Now you can do those migrations with Kubernetes. That opens up some interesting new patterns, and the option of running containerized Windows workloads in a managed Kubernetes service in the cloud.

Elton’s not kidding about this support being new. I’m not sure I’d entrust it for my production work just yet, but I’m glad to see people working on the problem.

Comments closed

Explaining Column Statistics

Bert Wagner takes us through column statistics in SQL Server:

Statistics are the primary meta data used by the query optimizer to help estimate the costs of retrieving data for a specific query plan.

The reason SQL Server uses statistics is to avoid having to calculate information about the data during query plan generation. For example, you don’t want to have the optimizer scan a billion row table to learn information about it, only to then scan it again when executing the actual query.

Instead, it’s preferable to have those summary statistics pre-calculated ahead of time. This allows the query optimizer to quickly generate and compare multiple candidate plans before choosing one to actually execute.

These statistics aren’t perfect, but life is almost always better when you have accurate, up-to-date statistics on relevant columns.

Comments closed

Microsoft ML Server 9.4

Jeroen Ter Heerdt announces Microsoft Machine Learning Server 9.4:

Today we’re excited to announce our latest Microsoft Machine Learning Server 9.4 release, which addresses popular customer requests as well as developments in the R and Python community.

Microsoft Machine Learning Server is your flexible enterprise platform for analyzing data at scale, building intelligent apps, and discovering valuable insights across your business with full support for Python and R. Machine Learning Server meets the needs of all constituents of the process – from data engineers and data scientists to line-of-business programmers and IT professionals. It offers a choice of languages and features and algorithmic innovation that brings the best of open source and proprietary worlds together.

This is the best way to bind new versions of R and Python to your SQL Server ML Services installation.

Comments closed

Deploying a Big Data Cluster

Mohammad Darab takes us through the Big Data Cluster deployment process using Azure Data Studio:

I’ve been “playing around” with Big Data Clusters for some time now and CTP 3.2 is way ahead when it comes to streamlining the BDC deployment process. You can check out my 4-part series on deploying BDC on AKS to see how cumbersome the process used to be. New in CTP 3.2, you can deploy a BDC on AKS (an existing cluster OR a new cluster) using an Azure Data Studio notebook. Let’s see how.

Click through for instructions. It was rather smart of Microsoft to release the instructions as a notebook.

Comments closed

Power BI Screen Usage Comparison

Gilbert Quevauvilliers compares the original Power BI service UI to the new one in terms of how much of the screen is available to our dashboard rather than the service UI:

I have seen quite a lot of chatter with regards to the Power BI Service new look and feel, which I personally really like.

People were saying that whilst they like the look and feel, it appears to be taking up more real-estate screen space. As well as others preferring the tabs along the bottom like Excel does.

I took it upon myself to do a quick and possibly quite simple test to see if the new look and feel make the report page smaller or larger.

The result was a bit of a surprise to me.

Comments closed

Nowcasting Unemployment

Peter Ellis takes us through an attempt to perform near-term projection of Australian unemployment rates based on macroeconomic indicators:

“Leading” in this case will have to mean pretty fast, because the official unemployment stats in Australia come out from the Australian Bureau of Statistics (ABS) with admirable promptitude given the complexity of managing the Labour Force Survey. ABS Series 6202.0 – the monthly summary from the Labour Force Survey – comes out around two weeks after the reference month. Only a few economic variables of interest are available faster than that. In this blog post I look at two candidates for leading information that are readily available in more or less real time – interest rates and stock exchange prices.

One big change in the past decade in this sort of short-term forecasting of unemployment has been to model the transitions between participation, employed and unemployed people, rather than direct modelling of the resulting proportions. This innovation comes from an interesting 2012 paper by Barnichon and Nekarda. I’ve only skimmed this paper, but I’d like to look into how much of the gains they report comes from the focus on workforce transitions, and how much from their inclusion of new information in the form of vacancy postings and claims for unemployment insurance. My suspicion is that these latter two series have powerful new information. I will certainly be returning to vacancy information and job adverts at a later time – these are items which feature prominently for me in my day job at Nous Group in analysing the labour market.

This gets a little deep but it’s well worth the read. H/T R-bloggers

Comments closed

An Intro to k-Means Clustering

Holger von Jouanne-Diedrich takes us through an example of how k-means clustering works:

The guiding principles are:

– The distance between data points within clusters should be as small as possible.
– The distance of the centroids (= centres of the clusters) should be as big as possible.

Because there are too many possible combinations of all possible clusters comprising all possible data points k-means follows an iterative approach

Click through for a demonstration. I appreciate adding visualizations for intermediate steps in there as well because it gives you an intuitive understanding for what the one-liner function is really doing.

Comments closed