Press "Enter" to skip to content

Month: April 2018

Nested Loops And Implicit Reordering

Dmitry Piliugin shows how the SQL Server optimizer can end up reordering data in a nested loops join to improve performance:

The purpose is to minimize random access impact. If we perform an Index Seek (with a partial scan, probably) we read the entries in the index order, in our case, in the order of CustomerID, which is clearly seen on the first result set. The index on CustomerID does not cover our query, so we have to ask the clustered index for the column SomeData, and actually, we perform one another seek, seeking by the SalesOrderID column. This is a random seek, so what if, before searching by the SalesOrderID we will sort by that key, and then issue an ordered sequence of Index Seeks, turning the random acces into the sequential one, wouldn’t it be more effective?

Yes, it would in some cases, and that is what “optimized” property tells us about. However, we remember, that it is not necessarily leads to the real reordering. As for comparing the real impact, I will refer you to the actual Craig’s post or leave it as a homework.

Read the whole thing.  This is one reason why it’s important to emphasize that in SQL, you can only assume order if you have an explicit ORDER BY clause.

Comments closed

Finding Overlapping Data Ranges

Louis Davidson shows how to find groups of data which overlap:

This week, I had a problem where I needed to find and eliminate from the results of my query, data with overlapping ranges. I have written about this topic before, in my database design book book, in regards to building a trigger to avoid overlapping ranges. But even though I have written on the topic there, I still use Google just like you to get quick help (I use books when I want to learn, or expand my knowledge on a topic in depth, blogs when I need a simple answer to a simple or complex question.)

The problem at hand is most often associated with date based data, such as effective dates for a row in a temporal/type 2 dimension table, or other cases like appointment times, etc. But the algorithm is the same with numbers and is a bit easier to read since we don’t have the same issues with roundoff and decimal places (the query is complex enough on its own to show in a blog post). From a progression of start and end values in each row, we are going to look at how to check to make sure that there are no two rows that are in conflict (no range should contain another ranges value at all).

This feels like the type of thing which could be rewritten with window function to be a little smoother, but I’d have to think about it more.  Louis does provide a good solution and explanation to a fairly common but tricky problem in T-SQL.

Comments closed

Upgrading SQL Server 2017 Standard Edition

Jo Douglass hits an error when upgrading to SQL Server 2017 on Standard Edition:

A quick one to signal boost this issue and its solution, as I’m sure other people will run into it. If you’re on Standard Edition of SQL Server and upgrading to 2017, you might run into an issue where the database services portion of the upgrade fails. This seems to be related to SSIS.

If you experience this problem, mid-way through the upgrade you’ll receive this error in a pop-up:

Wait on the Database Engine recovery handle failed. Check the SQL Server error log for potential causes.

At the end of the upgrade, it will show that the database services section has failed. Checking the error log will show this:

Script level upgrade for database ‘master’ failed because upgrade step ‘ISServer_upgrade.sql’ encountered error 917, state 1, severity 15.

Read on for the answer and a workaround.

Comments closed

Using Have I Been Pwned In R

Maelle Salmon shows us how to use the HIBPwned library in R:

The alternative title of this blog post is HIBPwned version 0.1.7 has been released! W00t!. Steph’s HIBPwned package utilises the HaveIBeenPwned.com API to check whether email addresses and/or user names have been present in any publicly disclosed data breach. In other words, this package potentially delivers bad news, but useful bad news!

This release is mainly a maintenance release, with some cool code changes invisible to you, the user, but not only that: you can now get account_breaches for several accounts in a data.frame instead of a list, and you’ll be glad to know that results are cached inside an active R session. You can read about more functionalities of the package in the function reference.

Wouldn’t it be a pity, though, to echo the release notes without a nifty use case? Another blog post will give more details about the technical aspects of the release, but here, let’s make you curious! How many CRAN package maintainers have been pwned?

Read on to find out that answer.

Comments closed

Jupyter Notebooks In Azure

Steve Jones looks at using Jupyter Notebooks in Azure:

There’s a new feature in Azure, and I stumbled on it when someone posted a link on Twitter. Apologies, I can’t remember who, but I did click on the Azure Notebooks link and was intrigued. I’ve gotten Jupyter notebooks running on my local laptop, but these are often just on one machine. Having a place to share a notebook in the cloud is cool.

Once I clicked on the link, I found these are both R and Python notebooks, as well as F#. These allow you to essentially build a page of code and share it. It’s kind of like a REPL, kind of like a story. It’s a neat way of working through a problem. I clicked the Get Started link to get going and was prompted for a User ID.

I’m a major fan of using notebooks for validating results as well as training people.

Comments closed

Index Design When Handling Sorts

Erik Darling walks us through some of the nuance of index deisgn:

When tuning queries that need to sort large amounts of data, sometimes it makes sense to stick the ordering elements as the leading key column(s) in your index. This allows SQL Server to easily sort your data by that column, and then access other key and included columns to satisfy other parts of the query, whether they’re joins, predicates, or selected columns.

That doesn’t mean that indexes with the sorting element first is always the best idea. Depending on how your query looks, you can sometimes put sorting elements at the end of an index definition and still have your sort be supported.

Read on for an example.

Comments closed

Running The SQL Server Features Discovery Report

Dave Mason shows us how to run the SQL Server features discovery report via command prompt and PowerShell:

I don’t need to validate SQL Server installations on a regular basis. When the need arises, my preference is to run the SQL Server features discovery report. Further, I prefer to run it from the command line. After looking up command line parameters one too many times, I decided to script it out.

It turns out the script commands are a little more complicated than I realized: there is a different setup.exe file for each version of SQL Server installed. I ended up making two script versions: a DOS batch file with hard-coded paths, and a PowerShell script that’s more robust. Check them out and let me know what you think. (Keep scrolling down for a report sample image.)

I’m not sure I’ve ever run that report, but now I know how to do it from Powershell.

Comments closed

Substitution Variables In Power Query

Doug Burke gives us an example of here substitution variables make our lives easier in Power Query:

A substitution variable substitutes a variable (a changing value) to get a different result

    a + b = c (where ‘a’ and ‘b’ are substitution variables that define value ‘c’)

  •         If a = 5 and b = 2 then c = 7
  •         If a = 25 and b = 9 then c = 34
 SubVars are especially helpful when
  • you need to move Power Query files from one file path, folder and file name to something completely different
  • you want to use the same PQ reports for different time periods (ie changing months)

It involves looking for changeable items.  If something may or will change in the future, it’s a candidate for a substitution variable.

  • file paths are good for subVars

  • so are time elements such as Year or Month

  • you can also combine subVars where both the file path and month may change

Click through for several good examples.

Comments closed

What Read Committed Isolation Level Gets You

Paul Randal explains the answer, which is “not much”:

The ‘weird’ behavior is that when the “Batch 2” select completes, after having been blocked by the “Batch 1” transaction, it doesn’t return all 1,000 rows (even though “Batch 1” has completed). Furthermore, depending on when the “Batch 2” select is started, during the 10-seconds that “Batch 1” executes, “Batch 2” returns different numbers of rows. This behavior had also been reported on earlier versions of SQL Server as well. It’s easy to reproduce on SQL Server 2016/2017 and can be reproduced in all earlier versions with a single configuration change (more details in a moment).

Additionally, if the table has a clustered index created, 1,000 rows are returned every time, on all versions of SQL Server.

So why is this weird? Many people expect that all 1,000 rows will be returned every time AND that the structure of the table or the version of SQL Server should not make any difference.

Unfortunately, that assumption is not correct when using read committed.

Read Committed is a trade-off, not an ideal.

Comments closed

The Joys Of Boolean Logic

Jen McCown explains boolean logic via truth tables:

t’s really fine if all the circuitry and algebra stuff makes no sense to you. We’re going to use a tried-and-true method to figuring out how these things will come out.

For a truth table, you just put in every combination of input types – meaning, inputs that will evaluate to true, and those that evaluate to false – and work out how the clause will evaluate it overall.

What do I mean? Well, we have two inputs for the questions above: StatusID and UserForeignID, which I’ll shorten to ID and ForeignIDto save characters. Logically speaking:

  • ID can either be equal to 1, or to a value other than 1.

  • ForeignID can either be equal to TD75R, or to a value other than TD75R.

A logic course is particularly helpful in these cases, but start by reading the whole thing.

Comments closed