Explaining Tree-Based Algorithms

Stephanie Glen takes us through quick explanations of decision trees, random forests, and gradient boosting:

The three methods are similar, with a significant amount of overlap. In a nutshell:

– A decision tree is a simple, decision making-diagram.
Random forests are a large number of trees, combined (using averages or “majority rules”) at the end of the process.
Gradient boosting machines also combine decision trees, but start the combining process at the beginning, instead of at the end.

Read on for more details. All three are useful algorithms serving similar but slightly different purposes.

Query Store Required Permissions

Andreas Wolter takes us through exactly which permissions are required for Query Store to work:

Typically, there are 3 aspects to the work with the Query Store, which can be reflected in roles:
1) Configuration – turning Query Store on and off, clearing the contents, flushing its contents to disk and changing its settings.
2) Viewing the reports or using the DMVs to analyze the Query Store contents (queries, plans and wait statistics) to gain insights but not necessarily having the authority to change anything
3) Actively change Plans by forcing or un-forcing, based on the information obtained from (2)

This is a nice overview of the problem and a fair amount of the solution.

Kubernetes on Windows

Elton Stoneman helps us get started with Kubernetes on Windows boxes:

Now you can take older .NET Framework apps and run them in Kubernetes, which is going to help you move them to the cloud and modernize the architecture. You start by running your old monolithic app in a Windows container, then you gradually break features out and run them in .NET Core on Linux containers.

Organizations have been taking that approach with Docker Swarm for a few years now. I cover it in my book Docker on Windows and in my Docker Windows Workshop. It’s a very successful way to do migrations – breaking up monoliths to get the benefits of cloud-native architecture, without a full-on rewrite project.

Now you can do those migrations with Kubernetes. That opens up some interesting new patterns, and the option of running containerized Windows workloads in a managed Kubernetes service in the cloud.

Elton’s not kidding about this support being new. I’m not sure I’d entrust it for my production work just yet, but I’m glad to see people working on the problem.

Explaining Column Statistics

Bert Wagner takes us through column statistics in SQL Server:

Statistics are the primary meta data used by the query optimizer to help estimate the costs of retrieving data for a specific query plan.

The reason SQL Server uses statistics is to avoid having to calculate information about the data during query plan generation. For example, you don’t want to have the optimizer scan a billion row table to learn information about it, only to then scan it again when executing the actual query.

Instead, it’s preferable to have those summary statistics pre-calculated ahead of time. This allows the query optimizer to quickly generate and compare multiple candidate plans before choosing one to actually execute.

These statistics aren’t perfect, but life is almost always better when you have accurate, up-to-date statistics on relevant columns.

Microsoft ML Server 9.4

Kevin Feasel


Python, R

Jeroen Ter Heerdt announces Microsoft Machine Learning Server 9.4:

Today we’re excited to announce our latest Microsoft Machine Learning Server 9.4 release, which addresses popular customer requests as well as developments in the R and Python community.

Microsoft Machine Learning Server is your flexible enterprise platform for analyzing data at scale, building intelligent apps, and discovering valuable insights across your business with full support for Python and R. Machine Learning Server meets the needs of all constituents of the process – from data engineers and data scientists to line-of-business programmers and IT professionals. It offers a choice of languages and features and algorithmic innovation that brings the best of open source and proprietary worlds together.

This is the best way to bind new versions of R and Python to your SQL Server ML Services installation.

Deploying a Big Data Cluster

Mohammad Darab takes us through the Big Data Cluster deployment process using Azure Data Studio:

I’ve been “playing around” with Big Data Clusters for some time now and CTP 3.2 is way ahead when it comes to streamlining the BDC deployment process. You can check out my 4-part series on deploying BDC on AKS to see how cumbersome the process used to be. New in CTP 3.2, you can deploy a BDC on AKS (an existing cluster OR a new cluster) using an Azure Data Studio notebook. Let’s see how.

Click through for instructions. It was rather smart of Microsoft to release the instructions as a notebook.

Power BI Screen Usage Comparison

Gilbert Quevauvilliers compares the original Power BI service UI to the new one in terms of how much of the screen is available to our dashboard rather than the service UI:

I have seen quite a lot of chatter with regards to the Power BI Service new look and feel, which I personally really like.

People were saying that whilst they like the look and feel, it appears to be taking up more real-estate screen space. As well as others preferring the tabs along the bottom like Excel does.

I took it upon myself to do a quick and possibly quite simple test to see if the new look and feel make the report page smaller or larger.

The result was a bit of a surprise to me.


July 2019
« Jun Aug »