Bugs and Pitfalls with Non-Determinism

Itzik Ben-Gan has started a new series on T-SQL bugs, pitfalls, and best practices. The first post in this series is on non-deterministic behavior:

A nondeterministic function is a function that given the same inputs, can return different results in different executions of the function. Classic examples are SYSDATETIME, NEWID, and RAND (when invoked without an input seed). The behavior of nondeterministic functions in T-SQL can be surprising to some, and could result in bugs and pitfalls in some cases.

Many people assume that when you invoke a nondeterministic function as part of a query, the function gets evaluated separately per row. In practice, most nondeterministic functions get evaluated once per reference in the query. 

This is the start to a great series.

Changing Constraints in Near-Zero Downtime Situations

I have part six of my interminable series on near-zero downtime deployments:

The locking story is not the same as with the primary and unique key constraints. First, there’s one extra piece: the transition will block access to dbo.LookupTable as well as the table we create the constraint on. That’s to keep us from deleting rows in our lookup table before the key is in place.

Second, the locks begin as soon as we hit F5. Even SELECT statements get blocked requesting a LCK_M_SCH_S lock. Bad news, people.

So what can we do to get around this problem? Two routes: the ineffectual way and the ugly way.

Despite my being a ray of sunshine here, you should still check this out. It’s shorter than the average Russian novel, at least.

Gaps and Islands with Dates

Bert Wagner hits one of my favorite topics:

In a traditional gaps and islands problem, the goal is to identify groups of continuous data sequences (islands) and groups of data where the sequence is missing (gaps).

While many people encounter gaps and islands problems when dealing with ranges of dates, and recently I did too but with an interesting twist:

How do you determine gaps and islands of data that has overlapping date ranges?

Check out Bert’s explanation of the solution; it’s a good one.

Performance Testing Simple Scalar UDF Functions

Wayne Sheffield tries out a simple scalar UDF in SQL Server 2019 to see how it performs:

I recently published a post detailing the new Scalar UDF Inlining feature in SQL 2019 here. That post introduced the new feature in a way that I used to compare performance to the other function types, continuing the performance evaluation of functions that I had previously posted here and here. In the Scalar UDF Inlining post, I used a function to strip all non-numeral values from a string, and to return the result. This used the FOR XML output option.

In thinking about how scalar functions are commonly used, I’ve decided to revisit this feature with a simpler function. I will still compare it to all the other types of functions to see how Scalar UDF Inlining compares to the others.

Wayne’s results are music to the product team’s ears, I’m sure.

Comparing STRING_AGG Performance Against FOR XML PATH

Jeffry Schwartz compares the performance of STRING_AGG in SQL Server 2017 versus the tried and true method of FOR XML PATH for string concatenation:

Two different T-SQL methods were used for testing.  The FOR XML PATH solution incorporates the reverse and stuff functions along with FOR XML PATH.  The String_agg solution is much simpler, but is only available on SQL Server 2017 currently.

The FOR XML PATH method has been documented thoroughly over the years in many articles, so these explanations will not be repeated here.  However, two points are worth mentioning.  The first one is that the @strSeparator (in this case ‘ or  ’) string will reside at the end of the constructed string (in this case @strFilter).  Since the last occurrence of @strSeparator must be removed to prevent a syntax error, characters must be dropped from the end of the string.  Although it may seem obvious that four characters (the length of @strSeparator) should be removed, actually only three need be removed because the end of the constructed string is “r” not “ “.  Therefore, the length argument of the left function specification is the length of @strFilter minus three.

I like STRING_AGG a lot, primarily because I can remember it and I can explain it to people relatively new to SQL Server. Read on to see how they both compare in terms of performance.

Querying Cosmos DB From SQL Server

Jovan Popovic shows how you can use the Cosmos DB ODBC driver to perform OPENROWSET queries against Cosmos DB:

Now you need to install ODBC Driver for CosmosDB on the computer where you have SQL Server installed. I’m using Microsoft Azure Cosmos DB ODBC 64-bit.msi for 64-bit Windows – 64-bit versions of Windows 8.1 or later, Windows 8, Windows 7, Windows Server 2012 R2, Windows Server 2012, and Windows Server 2008 R2.

Once you install this driver, you should setup ODBC source in system DSN and test the connection:

If you’re running SQL Server 2019, you can follow Jovan’s first two steps and then create an external data source and table with PolyBase to get to the same results.

Finding Three-Part or Four-Part Names in SQL Server

Louis Davidson shows how we can find three-part or four-part naming in T-SQL code:

In order to make this work, one of the considerations is to eliminate cross database dependencies, as you can’t reference objects that don’t exist in views, and even in stored procedures, which offer delayed resolution of objects, you can’t test the code without the database it is referencing.

In addition, and somewhat more important to the process, is dealing with three part names that reference the name of the database your object is in. During the comparison process, the database can be created with a name that is different from your target database to compare to (referred to as a shadow database.) So if you are in database X and have references to X.schema.table, but the database is generated as X_Shadow, the X. is now a cross database reference rather than the local reference you are desiring.

Four part names to linked servers are a different sort of nightmare, but one that is (hopefully) exceedingly rare. The queries presented will help with this as well.

Louis has a few scripts to help you find these. If your code is in source control already, you could also build a regular expression to search through it.

Enabling and Disabling Triggers

Kenneth Fisher shows how you can enable and disable triggers in SQL Server:

I tend to feel that a lot of people who use triggers don’t really understand them. That said, every now and again you have to deal with them. And in particular (for this post) you might need to disable and then re-enable them. Enabling and disable are identical commands in this case so I’m just going to use the DISABLEversion and you can just replace it with ENABLE as needed.

To be fair, the world might be a better place if we disabled a majority of triggers…

Changing Tables with Limited Downtime

I continue my series on near-zero downtime deployments and this time look at table changes:

The realm of More Significant Changes is not where you often want to be. There’s a lot of scaffolding code you need to write. Basically, suppose you want to make a repair on the 5th story exterior of an 8-story building. You have a couple of options: the YOLO option is to kick everybody out of the building and have people rappel from the top of the building down to the 5th story to make their changes. They need all of the people out of the building because shut up it’s a strained analogy. This approach is kind of inconvenient: people have to stay out of your building until your people are done repairing the exterior. That’s blocking in the database world.

On the other side, you can build a bunch of scaffolding and attach it to the exterior of the building, perform your repairs, and tear down that scaffolding. While the scaffolding is up, people come and go like normal and don’t even think about it. As you tear the scaffolding down, you temporarily block the door for a moment as you’re finishing your work. This is much more convenient for end users and fits the “near-zero downtime” approach we’re taking.

Strained analogies aside, this is a long post on making a series of table-related changes without your end users noticing.

Making Stored Procedure Changes With Limited Downtime

I continue my series on database development in a (near) zero downtime environment:

Versioning a procedure is pretty simple: you create a new procedure with alterations you want. Corporate naming standards where I’m at have you add a number to the end of versioned procedures, so if you have dbo.SomeProcedure, the new version would be dbo.SomeProcedure01. Then, the next time you version, you’ll have dbo.SomeProcedure02 and so on. For frequently-changing procedures, you might get up to version 05 or 06, but in practice, you’re probably not making that many changes to a procedure’s signature. For example, looking at a directory with exactly 100 procedures in it, I see 7 with a number at the end. Two of those seven procedures are old versions of procedures I can’t drop quite yet, so that means that there are only five “unique” procedures that we’ve versioned in a code base which is two years old. Looking at a different part of the code with 879 stored procedures, 95 have been versioned at least once in the 15 or so years of that code base’s existence. The real number is a bit higher than that because we’ve renamed procedures over time and renamings tend to start the process over as we might go from dbo.SomeProcedure04 to dbo.SomeNewProcedure when we redesign underlying tables or make other drastic architectural changes.

The secret is, I’m always versioning.

Categories

March 2019
MTWTFSS
« Feb  
 123
45678910
11121314151617
18192021222324
25262728293031