Press "Enter" to skip to content

Month: May 2019

PolyBase on Linux

I have a post showing how to set up PolyBase on Linux:

Now that we have SQL Server on Linux installed, we can begin to install PolyBase. There are some instructions here but because we started with the Docker image, we’ll need to do a little bit of prep work. Let’s get our shell on.

First, run docker ps to figure out your container ID. Mine is 818623137e9f. From there, run the following command, replacing the container ID with a reasonable facsimile of yours.

I actually fired up my copy of SimCity 2000 to take a screenshot for this post. The things I do for my audience.

Comments closed

Making SpeedPASSes Better

Wayne Sheffield shares some useful tips for making the SpeedPASS experience better:

As things were wrapping up for our event last year, there was an important change made at the SQLSaturday site for dealing with SpeedPASSes. Previously, the admission ticket was sized differently from everything else, which created hassles in cutting and in using perforated paper (and why there were custom-printed SpeedPASSes – so that we could make them fit). However, the change was that all of the labels are now the same size. This means that perforated paper can now be used to print out the SpeedPASS PDF files that are generated at the SQLSaturday site.

Additionally, I was looking for ways to automate some of the manual process of merging the PDF files as described in the previous post, especially with the work necessary in the Excel spreadsheet. The end result is that we now have a new process that we used, and I’m sharing it with everyone to help them out.

Read on for tips including perforated paper and Powershell scripts tying it all together.

Comments closed

Avoiding DONE Tokens in Loops

Emanuele Meazzo shows one reason why loops can be so much slower in T-SQL:

Not everybody knows that SQL Server sends a DONE Token to the client each time that a SQL statement completes (so, everything except variable declarations); For the query above you can basically track it with extended events by tracking the “SQL Statement Completed” event.
What happens in a loop? For each statement that’s completed, a token is sent, which means that for this loop that contains 3 statements (the WHILE loop itself it’s a statement) 15 Million tokens are sent to the client.

There can be a drastic difference, as Emanuele shows.

Comments closed

SQL Server 2019 CTP 3.0

The SQL Server team has announced the latest CTP for SQL Server 2019:

Big data clusters
– Scale out by supporting deployment configurations with an increased number of SQL Server instances in the compute pool. You can now specify up to 4 instances in the compute pool for optimal performance of your queries against data pool, storage pool, or other external data sources.
The mssqlctl utility includes updates to ease the big data cluster management experience with enhancements to the login experience. There is also a new command to list the cluster endpoints.
Persistent volumes abstract the details of how the storage is provided and how it’s consumed. In this release, we’re enhancing the supported storage configurations by enabling you to customize storage classes independently for logs and data. With these changes, we also consolidated the storage configurations for big data components, so that the number of persistent volume claims for a big data cluster has been reduced for a default minimum configuration cluster.

There are a few other changes announced in this CTP. Now that we’re at 3.0, the light is at the end of the tunnel.

Comments closed

Simplifying Imported Biml

David Stein takes us through an example of what imported Biml looks like and how you can make it better:

This post is part of a series, Short Cuts to Biml Mastery which shows you how to learn BimlScript through a series of short cuts and remain productive while doing so. In the previous article, I covered how to import existing packages into Biml using BimlExpress. In this article, we’ll examine that Biml and simplify it in preparation for automation. If you’ve been reading along in this series, you’ll remember that we started with a single SSIS package which extracts data from a SQL source and creates a CSV file with quote identifiers. 

Read on to see the example and David’s first steps toward improving it.

Comments closed

Foreign Key Checks and Index Usage Stats

Marek Masko walks through an attendee question:

Last week I was speaking at SQLDay 2019 conference in Wroclaw, Poland. My session was about things you should focus on during work with Legacy Databases. One of the topics I discussed was concerning the database usage statistics collection and aggregation (mainly indexes and stored procedures).

After the session, one of the attendees came to me and ask me if I know that actions causing check of Foreign Keys and that use indexes underneath, don’t update index usage stats DMV. I was very surprised because, in my opinion, such behavior would be a huge SQL Server defect. So I decided to check it out…

Read on for Marek’s explanation and demo.

Comments closed

Non-Linear Classifiers with Support Vector Machines

Rahul Khanna continues a series on support vector machines:

In this blog post, we will look at a detailed explanation of how to use SVM for complex decision boundaries and build Non-Linear Classifiers using SVM. The primary method for doing this is by using Kernels.

In linear SVM we find margin maximizing hyperplane with features Xi’s . Similarly, in Logistic regression, we also try to find the hyperplane which minimizes logistic loss with features Xi’s. Most often when we use both these techniques the results are the same. But linear SVM or for the same reason a logistic regression would fail where there is a need to have complex or non-linear decision boundaries. These types of boundaries are then achieved by SVM using Kernels. So let us understand how SVM creates non-linear boundaries using Kernels

Read on to see how it works.

Comments closed

Vectors for Programmers

John Mount has a couple of videos available:

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material.

Click through for the links, one with Python examples and the other with R examples.

Comments closed

Data Type Conversions in Predicates

Bert Wagner takes us through a troublesome table design:

This table stores data for an application that has many different types of Pages. Each Page stores different types of data, but instead of creating a separate table for each type, we store all the different data in the varchar DataValue column and maintain the original data type in the DataType column.

This structure reduces the complexity required for maintaining our database (compared to creating possibly hundreds of tables, one for each PageName) and makes querying easier (only need to query one table). However, this design could also lead to some unexpected query results.

This is your daily reminder that an attribute should be a thing which describes an entity, not one of multiple things.

Comments closed