Press "Enter" to skip to content

Day: April 6, 2017

Re-Incrementing Identity Columns

Kenneth Fisher explains how to change the increment value of a table which already has an identity column:

The first thing to do is remember that all tables are partitioned. Sort of. What we think of as non-partitioned tables are really just tables with a single partition. Every table is listed in sys.partitions and in fact you can use it to quickly see how many rows there are in a table. Since there is no partition scheme/function we can’t do splits or merges, but we can do a SWITCH.

What we are going to do is create a new, virtually identical table, then switch the data from the old table (partition) to the new table (partition). The trick is that while in order to do the switch almost everything has to be identical, the properties of the identity column are part of that almost.

I love these types of solutions:  hacks in the most positive connotation of the term.

Comments closed

Understanding Boosted Trees

Maria Jesus Alonso explains decision trees and their subsequent improvements:

Bagging (or Bootsrap Aggregating), the second prediction technique brought to the BigML Dashboard and API, uses a collection of trees (rather than a single one), each tree built with a different random subset of the original dataset for each model in the ensemble. Specifically, BigML defaults to a sampling rate of 100% (with replacement) for each model. This means some of the original instances will be repeated and others will be left out. Bagging performs well when a dataset has many noisy features and only one or two are relevant. In those cases, Bagging will be the best option.

Random Decision Forests extend the Bagging technique by only considering a random subset of the input fields at each split of the tree. By adding randomness in this process, Random Decision Forests help avoid overfitting. When there are many useful fields in your dataset, Random Decision Forests are a strong choice.

Click through for how boosted trees change this model a bit.

Comments closed

Columnstore Partitioning In SQL Server 2016

Niko Neugebauer demonstrates some performance improvements to partitioned columnstore indexes in SQL Server 2016:

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.

Comments closed

Using OLS To Fit Rational Functions

Srini Kumar and Bob Horton show how to use the lm function to fit functions using the Pade Approximation:

Now we have a form that lm can work with. We just need to specify a set of inputs that are powers of x (as in a traditional polynomial fit), and a set of inputs that are y times powers of x. This may seem like a strange thing to do, because we are making a model where we would need to know the value of y in order to predict y. But the trick here is that we will not try to use the fitted model to predict anything; we will just take the coefficients out and rearrange them in a function. The fit_pade function below takes a dataframe with x and y values, fits an lm model, and returns a function of x that uses the coefficents from the model to predict y:

The lm function does more than just fit straight lines.

Comments closed

TempDB Spill And Checkpoint Writes

Lonny Niederstadt shows that the -k startup option throttles two separate things:

The -k startup option can throttle checkpoint writes, and can throttle tempdb spills.
On my systems, I’ve never seen an overwhelming checkpoint but I’ve seen plenty of overwhelming spills to tempdb.  But are spill writes through the checkpoint mechanism?  If so, then -k would just be throttling checkpoint writes to persistent databases and to tempdb – same function in two contexts.

Let’s take a look.  I’ll look at the same test scenarios I used in my March 29, 2017 blog post: an “insert into #temptable select…”, a “select… into #temptable”, and an index create with sort_in_tempdb.

For further reference, here is his preliminary research into the -k option.

Comments closed

New RTVS Instructions

Ginger Grant has updated her instructions for installing R Tools for Visual Studio and getting R Services to work on SQL Server:

In addition to having an SQL Server 2016 instance with R Server installed, the following components are needed on a client machine

The Comprehensive R Archive Network

RStudio (optional)

Visual Studio 2015 R Tools

This list is a change from the previous list I have provided as RTVS contains an installation of R Client, there is no need to download that as well. You do not need to download Microsoft R Open if you are using R Server either.  Once RTVS is installed, there is a menu option on the R Tools window. Selecting Install R Client from the menu will handle the information. Unfortunately, there is no change to the menu option once R Client is installed, it always looks like you should install it.  To find out if R Client has been installed, look in the Workspaces window.

In other words, fewer dependencies and an easier installation process.  Read the whole thing to avoid RevoScaleR errors in your code post-upgrade.

Comments closed

Doubly Linked Lists And Bugs

Ewald Cress explains a bug in DBCC PAGE:

Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.

Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.

Very interesting article, and also a good juxtaposition of supported, “production-safe” code versus undocumented processes.

Comments closed

Connecting Powershell To Docker Containers

Andrew Pruski shows how to open a Powershell session on a container:

One of the questions that I was asked at SQL Saturday Iceland was “how can I view the filesystem within a container?”.

This is a great question as one of the things that people find off-putting about containers is their opaqueness. It’s not obvious where everything lives within the container or how we can view the files within it.

Thankfully there’s a simple docker command that allows us to open a powershell session within a container, that command is docker exec.

For Linux-based containers, /bin/bash (or your favorite shell, if it’s installed) serves as its analog.

Comments closed