Press "Enter" to skip to content

Month: May 2017

Using Plan Guides

Kendra Little has a video showing how to use plan guides:

If you need to add, remove, or replace hints from ad-hoc queries where you can’t change the code, plan guides can help. See a demo of removing a query hint from parameterized TSQL run from an application, and get tips on how to make your plan guides work in SQL Server.

The code from the demo is here. Links for more info are below the video. Have fun!

Click through to watch the video, or you can catch the podcast version.

Comments closed

Deploying Packages To SQL Server R Services

Tracy Boggiano has a Powershell script to deploy packages to an instance running SQL Server R Services:

Somehow I have become the R DBA at my job which I don’t mind, I plan on taking Microsoft’s Professional Program on Data Science to be familiar with it.  But recently I’ve had to upload files to our R servers which the first time wasn’t too bad.  Copy these files to six different servers but come the second time around it became apparent that the Predictive Analytics Manger was going to be asking me to do this more frequently than I wanted to to it manually.  So I wrote a quick PowerShell function to take care of this added to our module we use in house.  It unzips the file provided to the correct location.  It does assume you have administrative rights to your server i.e. you can use the admin shares (c$) for example on the server.  You will need to get the function Get-CMSHost from my Running SQL Scripts Against Multiple Servers Using PowerShell post to run the code below.

Click through for the script.  This is particularly useful for deploying in-house packages and you don’t want to set up a miniCRAN.

Comments closed

Installing Power BI Report Server

Adam Saxton has a video showing how to install and configure Power BI Report Server:

In this video, I look at how to install and configure the May 2017 Preview of Power BI Report Server. Power BI Report Server has a new standalone install experience and this product allows for Power BI reports to be rendered in the web portal along with paginated reports.

This will get you started with the new version.

I was really excited about this preview until I realized that, for now, it only works for Analysis Services data sources.

Comments closed

Azure AD On Azure SQL DB

Arun Sirpal shows how to set up Azure SQL Database to use Azure Active Directory accounts:

I think it is important to highlight a couple of points, more specifically around the requirement of ADALSQL.DLL and proper setup of AD which I will highlight below and reference some links, please do this as it lays the foundation for you.

ADALSQL.DLL

You need ADALSQL.DLL which is part of the latest SQL Server Management Studio (SSMS) to test access. This stands for Active Directory Authentication Library for SQL Server.

This goes through some of the issues Arun had setting everything up and provides workarounds and explanations.

Comments closed

Check Your Transactions

John Morehouse talks about a mistake he made:

The other day I had to update some records, in Production.  I’m a firm believer of using explicit transactions and double checking things before committing a transaction.  This helps ensure things go as expected.  This also allows me a way to rollback the changes if they don’t.  It happens.

However, this means that I have to COMMIT said explicit transaction.  And not go to lunch without doing so.

Can you see my mistake?  I bet you can.

Fortunately, it sounds like it wasn’t a critical problem.  If you want to check for open transactions, Jack Vamvas has a couple methods.

Comments closed

Building Temporal Tables

Bert Wagner shows how to create a temporal table in SQL Server 2016:

I want to make my life easier by using temporal tables! Take my money and show me how!

I’m flattered by your offer, but since we are good friends I’ll let you in on these secrets for free.

First let’s create a temporal table. I’m thinking about starting up a car rental business, so let’s model it after that:

There are some places where temporal tables can get better (particularly around feeling more like a type 2 slowly changing dimension), but I’m pretty happy with this feature.

Comments closed

Finding Partition Boundaries

Kenneth Fisher shows how to find the min and max values for a partition:

So what does it do? Per BOL

Returns the partition number into which a set of partitioning column values would be mapped for any specified partition function in SQL Server 2016.

So it basically tells us which partition any given row is in. This can be particularly handy at times. For example, if you want to know the min and max values of a column per partition.

Read on for a couple scripts which use $Partition.

Comments closed

The Power Of The Stacked Ensemble

Funda Gunes describes the value of ensemble models in data science competitions:

A simple way to enhance diversity is to train models by using different machine learning algorithms. For example, adding a factorization model to a set of tree-based models (such as random forest and gradient boosting) provides a nice diversity because a factorization model is trained very differently than decision tree models are trained. For the same machine learning algorithm, you can enhance diversity by using different hyperparameter settings and subsets of variables. If you have many features, one efficient method is to choose subsets of the variables by simple random sampling. Choosing subsets of variables could be done in more principled fashion that is based on some computed measure of importance which introduces the large and difficult problem of feature selection.

In addition to using various machine learning training algorithms and hyperparameter settings, the KDD Cup solution shown above uses seven different feature sets (F1-F7) to further enhance the diversity.  Another simple way to create diversity is to generate various versions of the training data. This can be done by bagging and cross validation.

I think there’s a pretty strong contrast between competitions and general practice, where you’re doing everything you can to eek out a higher prediction score in the competition, but in practice, you’re aiming to balance a “good enough” prediction with hardware/time requirements and code complexity, and so the model selection process can be quite different.

Comments closed

Lessons From A Data Analysis Exercise

Bill Schmarzo has an interesting post summarizing the results of an MBA class exercise involving data analysis:

Lesson #2:  Quick and dirty visualizations are critical in understanding what is happening in the data and establishing hypotheses to be tested. For example, the data visualization in Figure 1 quickly highlighted the importance of offensive rebounds and three-point shooting percentage in the Warriors’ overtime losses.

Read the whole thing.

Comments closed

Understanding Bootstrap Aggregating (Bagging)

Gabriel Vasconcelos explains the bagging technique:

The name bagging comes from boostrap aggregating. It is a machine learning technique proposed by Breiman (1996) to increase stability in potentially unstable estimators. For example, suppose you want to run a regression with a few variables in two steps. First, you run the regression with all the variables in your data and select the significant ones. Second, you run a new regression using only the selected variables and compute the predictions.

This procedure is not wrong if your problem is forecasting. However, this two step estimation may result in highly unstable models. If many variables are important but individually their importance is small, you will probably leave some of them out, and small perturbations on the data may drastically change the results.

Read on to see how bootstrap aggregation works and how it solves this solution instability problem.

Comments closed