MRAN Changes and a Survey

David Smith discusses potential changes to MRAN:

As CRAN has grown and changes to packages have become more frequent, maintaining MRAN is an increasingly resource-intensive process. We’re contemplating changes, like changing the frequency of snapshots, or thinning the archive of snapshots that haven’t been used. But before we do that we’d  like to hear from the community first. Have you used MRAN snapshots? If so, how are you using them? How many different snapshots have you used, and how often do you change that up? Please leave your feedback at the survey link below by June 14, and we’ll use the feedback we gather in our decision-making process. Responses are anonymous, and we’ll summarize the responses in a future blog post. Thanks in advance!

Please take the survey as well. If you’ve used SQL Server Machine Learning Services (or SQL Server R Services), you’ve used MRAN.

Debugging a Pivot

Ed Elliott takes us through problems with the PIVOT statement:

If you have a PIVOT query and it isn’t returning the data you expect, what can you do to troubleshoot it? The thing to do is to break it down into the constituent parts. First, lets take a look at a query and see what we can do to help.

Click through for potential problems and their solutions.

PolyBase on Linux

I have a post showing how to set up PolyBase on Linux:

Now that we have SQL Server on Linux installed, we can begin to install PolyBase. There are some instructions here but because we started with the Docker image, we’ll need to do a little bit of prep work. Let’s get our shell on.

First, run docker ps to figure out your container ID. Mine is 818623137e9f. From there, run the following command, replacing the container ID with a reasonable facsimile of yours.

I actually fired up my copy of SimCity 2000 to take a screenshot for this post. The things I do for my audience.

Making SpeedPASSes Better

Wayne Sheffield shares some useful tips for making the SpeedPASS experience better:

As things were wrapping up for our event last year, there was an important change made at the SQLSaturday site for dealing with SpeedPASSes. Previously, the admission ticket was sized differently from everything else, which created hassles in cutting and in using perforated paper (and why there were custom-printed SpeedPASSes – so that we could make them fit). However, the change was that all of the labels are now the same size. This means that perforated paper can now be used to print out the SpeedPASS PDF files that are generated at the SQLSaturday site.

Additionally, I was looking for ways to automate some of the manual process of merging the PDF files as described in the previous post, especially with the work necessary in the Excel spreadsheet. The end result is that we now have a new process that we used, and I’m sharing it with everyone to help them out.

Read on for tips including perforated paper and Powershell scripts tying it all together.

Avoiding DONE Tokens in Loops

Emanuele Meazzo shows one reason why loops can be so much slower in T-SQL:

Not everybody knows that SQL Server sends a DONE Token to the client each time that a SQL statement completes (so, everything except variable declarations); For the query above you can basically track it with extended events by tracking the “SQL Statement Completed” event.
What happens in a loop? For each statement that’s completed, a token is sent, which means that for this loop that contains 3 statements (the WHILE loop itself it’s a statement) 15 Million tokens are sent to the client.

There can be a drastic difference, as Emanuele shows.

SQL Server 2019 CTP 3.0

The SQL Server team has announced the latest CTP for SQL Server 2019:

Big data clusters
– Scale out by supporting deployment configurations with an increased number of SQL Server instances in the compute pool. You can now specify up to 4 instances in the compute pool for optimal performance of your queries against data pool, storage pool, or other external data sources.
The mssqlctl utility includes updates to ease the big data cluster management experience with enhancements to the login experience. There is also a new command to list the cluster endpoints.
Persistent volumes abstract the details of how the storage is provided and how it’s consumed. In this release, we’re enhancing the supported storage configurations by enabling you to customize storage classes independently for logs and data. With these changes, we also consolidated the storage configurations for big data components, so that the number of persistent volume claims for a big data cluster has been reduced for a default minimum configuration cluster.

There are a few other changes announced in this CTP. Now that we’re at 3.0, the light is at the end of the tunnel.

Simplifying Imported Biml

David Stein takes us through an example of what imported Biml looks like and how you can make it better:

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. In the previous article, I covered how to import existing packages into Biml using BimlExpress. In this article, we’ll examine that Biml and simplify it in preparation for automation. If you’ve been reading along in this series, you’ll remember that we started with a single SSIS package which extracts data from a SQL source and creates a CSV file with quote identifiers. 

Read on to see the example and David’s first steps toward improving it.

Foreign Key Checks and Index Usage Stats

Marek Masko walks through an attendee question:

Last week I was speaking at SQLDay 2019 conference in Wroclaw, Poland. My session was about things you should focus on during work with Legacy Databases. One of the topics I discussed was concerning the database usage statistics collection and aggregation (mainly indexes and stored procedures).

After the session, one of the attendees came to me and ask me if I know that actions causing check of Foreign Keys and that use indexes underneath, don’t update index usage stats DMV. I was very surprised because, in my opinion, such behavior would be a huge SQL Server defect. So I decided to check it out…

Read on for Marek’s explanation and demo.

Non-Linear Classifiers with Support Vector Machines

Rahul Khanna continues a series on support vector machines:

In this blog post, we will look at a detailed explanation of how to use SVM for complex decision boundaries and build Non-Linear Classifiers using SVM. The primary method for doing this is by using Kernels.

In linear SVM we find margin maximizing hyperplane with features Xi’s . Similarly, in Logistic regression, we also try to find the hyperplane which minimizes logistic loss with features Xi’s. Most often when we use both these techniques the results are the same. But linear SVM or for the same reason a logistic regression would fail where there is a need to have complex or non-linear decision boundaries. These types of boundaries are then achieved by SVM using Kernels. So let us understand how SVM creates non-linear boundaries using Kernels

Read on to see how it works.

Vectors for Programmers

John Mount has a couple of videos available:

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material.

Click through for the links, one with Python examples and the other with R examples.

Categories

May 2019
MTWTFSS
« Apr  
 12345
6789101112
13141516171819
20212223242526
2728293031