Press "Enter" to skip to content

Author: Kevin Feasel

DataRows Everywhere

Shane O’Neill shows how to deal with .NET DataRows in Powershell:

Now I don’t know about you but when I query stuff in a SQL database, it’s to do something to/with the results. They could be a list of servers that I monitor, they could be a list of databases that I want to check the recovery model of, or it could be a list of tables that I want to see how much space they are using. The main point is that I want to do something with the results.

But for this simple case, I just want to list out the customer name from this table. Simple? Yes, but this is just a test case to prove a point.

So let’s PowerShell this!

The moral of the story is to Get-Member early and Get-Member often.

Comments closed

Connection Retry Intervals

Arvind Shyamsundar explains a recent Management Studio change:

One such change SSMS got for free is the connection resiliency logic within the SqlConnection.Open() method. To improve the default experience for clients which connect to Azure SQL Database, the above method will (in the case of initial connection errors / timeouts) now retry 1 time after sleeping for 10 seconds. These numbers are configurable by properties called ConnectRetryCount (default value 1) and ConnectRetryInterval (default value 10 seconds.) The previous versions of the SqlConnection class would not automatically retry in cases of connection failure.

In general, because of these changes, transient errors (across slow networks or when working with Azure SQL Database) are less frequent. However, when you consider that a lot of SSMS users still use it with ‘regular’ SQL Server – either in a VM in the cloud or on-premises, there is a subtle but distinct impact of these changes which may affect administrators of ‘regular’ SQL Server databases.

Read on for details on why Management Studio hangs for 10-second periods, and see how to disable connection retry yourself.

Comments closed

Keeping Up To Date

Daniel Janik shows that query-level wait stats depend on the version of Management Studio:

The waitstats don’t appear at all in my older Surface which has a newer version of SQL. So what is 4202.2?  It’s a refresh for Master Data Services and R. Could that really have broken my query plan waitstats?

I doubt it but maybe. I updated to make the two equal. Did the waitstats go away?

No.

When reading the solution, it seems obvious, but this is a good reminder that there are a lot of moving parts here, and one of the early troubleshooting steps for “It works here, so why not over here?” types of issues is to make sure software is at the same version number.

Comments closed

Re-Incrementing Identity Columns

Kenneth Fisher explains how to change the increment value of a table which already has an identity column:

The first thing to do is remember that all tables are partitioned. Sort of. What we think of as non-partitioned tables are really just tables with a single partition. Every table is listed in sys.partitions and in fact you can use it to quickly see how many rows there are in a table. Since there is no partition scheme/function we can’t do splits or merges, but we can do a SWITCH.

What we are going to do is create a new, virtually identical table, then switch the data from the old table (partition) to the new table (partition). The trick is that while in order to do the switch almost everything has to be identical, the properties of the identity column are part of that almost.

I love these types of solutions:  hacks in the most positive connotation of the term.

Comments closed

Understanding Boosted Trees

Maria Jesus Alonso explains decision trees and their subsequent improvements:

Bagging (or Bootsrap Aggregating), the second prediction technique brought to the BigML Dashboard and API, uses a collection of trees (rather than a single one), each tree built with a different random subset of the original dataset for each model in the ensemble. Specifically, BigML defaults to a sampling rate of 100% (with replacement) for each model. This means some of the original instances will be repeated and others will be left out. Bagging performs well when a dataset has many noisy features and only one or two are relevant. In those cases, Bagging will be the best option.

Random Decision Forests extend the Bagging technique by only considering a random subset of the input fields at each split of the tree. By adding randomness in this process, Random Decision Forests help avoid overfitting. When there are many useful fields in your dataset, Random Decision Forests are a strong choice.

Click through for how boosted trees change this model a bit.

Comments closed

Columnstore Partitioning In SQL Server 2016

Niko Neugebauer demonstrates some performance improvements to partitioned columnstore indexes in SQL Server 2016:

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

These improvements were swamped by the aggregate predicate pushdown improvements in 2016, at least in Niko’s example, but I’ll take a free 25%-33% performance improvement.

Comments closed

Using OLS To Fit Rational Functions

Srini Kumar and Bob Horton show how to use the lm function to fit functions using the Pade Approximation:

Now we have a form that lm can work with. We just need to specify a set of inputs that are powers of x (as in a traditional polynomial fit), and a set of inputs that are y times powers of x. This may seem like a strange thing to do, because we are making a model where we would need to know the value of y in order to predict y. But the trick here is that we will not try to use the fitted model to predict anything; we will just take the coefficients out and rearrange them in a function. The fit_pade function below takes a dataframe with x and y values, fits an lm model, and returns a function of x that uses the coefficents from the model to predict y:

The lm function does more than just fit straight lines.

Comments closed

TempDB Spill And Checkpoint Writes

Lonny Niederstadt shows that the -k startup option throttles two separate things:

The -k startup option can throttle checkpoint writes, and can throttle tempdb spills.
On my systems, I’ve never seen an overwhelming checkpoint but I’ve seen plenty of overwhelming spills to tempdb.  But are spill writes through the checkpoint mechanism?  If so, then -k would just be throttling checkpoint writes to persistent databases and to tempdb – same function in two contexts.

Let’s take a look.  I’ll look at the same test scenarios I used in my March 29, 2017 blog post: an “insert into #temptable select…”, a “select… into #temptable”, and an index create with sort_in_tempdb.

For further reference, here is his preliminary research into the -k option.

Comments closed

New RTVS Instructions

Ginger Grant has updated her instructions for installing R Tools for Visual Studio and getting R Services to work on SQL Server:

In addition to having an SQL Server 2016 instance with R Server installed, the following components are needed on a client machine

The Comprehensive R Archive Network

RStudio (optional)

Visual Studio 2015 R Tools

This list is a change from the previous list I have provided as RTVS contains an installation of R Client, there is no need to download that as well. You do not need to download Microsoft R Open if you are using R Server either.  Once RTVS is installed, there is a menu option on the R Tools window. Selecting Install R Client from the menu will handle the information. Unfortunately, there is no change to the menu option once R Client is installed, it always looks like you should install it.  To find out if R Client has been installed, look in the Workspaces window.

In other words, fewer dependencies and an easier installation process.  Read the whole thing to avoid RevoScaleR errors in your code post-upgrade.

Comments closed

Doubly Linked Lists And Bugs

Ewald Cress explains a bug in DBCC PAGE:

Let’s start with the safety convention. The “null” of a null pointer isn’t a magic value, but in real-life implementation is simply zero, which is a perfectly valid virtual address. However, on the premise that trying to access address zero or addresses near it probably indicates a program error, the OS will map that page in such a way that trying to access it causes an access violation. This is not a bug or an accident, but a damn clever feature! Robert Love explains it very nicely over here for Linux, and it applies equally to Windows.

Now recall the convention that trying to retrieve the head or tail of an empty list will – by convention – bring you back a null pointer. When iterating, a related convention may also return a zero when you’ve gone all the way around and come back to the list head. Clearly the onus is on the developer to recognise that null pointer and not dereference it, but attempting to do so sets in motion the safety feature of an access violation, which can then be neatly caught through standard exception handling, for instance yielding a diagnostic stack dump.

Very interesting article, and also a good juxtaposition of supported, “production-safe” code versus undocumented processes.

Comments closed