Press "Enter" to skip to content

Category: T-SQL

Solving the Gaps and Islands Set of Problems

Ed Pollack continues a series on gap and island analysis:

Gaps and islands analysis supplies a mechanism to group data organically in ways that a standard GROUP BY cannot provide. Once we know how to perform an analysis and group data into islands, we can extend this into the realm of real data.

For all code examples in this article, we will use a set of baseball data that I’ve created and maintained over the years. This data is ideal for analytics as it is large and contains data quality that varies between very accurate and very sloppy. As a result, we are forced to consider data quality in our work, as well as scrutinize boundary conditions for correctness. This data will be used without much introduction as we will only reference two tables, and each is relatively straightforward.

The code in this article gets a bit complex, but Ed shows off some powerful techniques.

Comments closed

Contrasting TVPs and Memory-Optimized TVPs

Denis Gobo wants to see what memory-optimized table-valued parameters are good for:

The other day I was thinking about the blog post Faster temp table and table variable by using memory optimization I read a while back. Since you can’t believe anything on the internets (no disrespect to whoever wrote that post) , I decided to take this for a test

In this post I will be creating 2 databases, one is a plain vanilla database and the other, a database that also has a file group that contains memory optimized data

I will also be creating a table type in each database, a plain one and a memory optimized one in the memory optimized database

Read on for Denis’s findings.

Comments closed

Fun with SET Options

Dan Guzman takes us through different SET options in T-SQL and where you can go wrong:

ANSI_PADDING OFF has also been deprecated for quite some time and the SQL Server documentation specifically calls out “ANSI_PADDING should always be set to on.” In summary, a column-level ANSI_PADDING OFF setting causes nullable fixed-length char(n) and binary(n) columns to behave like variable-length varchar(n) and varbinary(n) columns. Furthermore, SQL Server automatically trims trailing blank characters from character data and leading binary zeros from binary data and stores the values as variable length instead of storing the provided value as-is during inserts and updates. Varchar(n)/varbinary(n) columns with ANSI_PADDING OFF are similarly trimmed. Note that it is the persisted ANSI_NULLS column meta-data setting that determines the storage and trimming behavior, not the current session ANSI_PADDING setting. The session ANSI_PADDING must still be ON when using features that require proper settings.

Some of these will pop up in occasional errors, like if you’re using filtered indexes or indexed views.

Comments closed

Qutoed Data and OPENROWSET

Dave Mason wants to remove quoted identifiers from a flat file:

I haven’t shown all the columns, but you get the idea–every column in the result set has data enclosed in double quotes. That’s exactly how it appears in the source data file.

Dave has a method which works for plenty of versions of SQL Server. If you’re using 2017 or later, the FIELDQUOTE parameter was added to solve this problem, though to be fair, I haven’t actually tried it to see if it works as expected.

Comments closed

Not All Cursors are Bad

Erik Darling doesn’t want to mess with your cursors (that much):

Read the code. Understand the requirements.

I tune queries all day long. The number of times someone has said THIS CURSOR IS A REAL BIG PROBLEM and been right is pretty small.

Often, there was a tweak to the cursor options, or a tweak to the query the cursor was calling (or the indexes available to it) that made things run in a more immediate fashion. I want to tune queries, not wrestle with logic that no one understands. Old code is full of that.

I’ll grant the premise (and add my own case where a cursor was necessary to solve the problem), though I did work at one company where the entire product logic was driven by nested cursors 5 or 6 levels deep. Those were really big problems. I think you’ll find the problem most frequently in shops with a heavy dose of Oracle, as Oracle cursors do perform well.

Comments closed

Using the OUTPUT Clause

Eduardo Pivaral takes us through the OUTPUT clause:

Even when the code is easy to read, but if you use this pattern over all your codebase, maintain it can become difficult if you have to change object names or implement it on another system.

T-SQL language provides the OUTPUT clause, that allows you to retrieve information from a DML statement in the same batch.

This is pretty useful for performance tuning in some scenarios, but also for simplifying multi-step processes.

Comments closed

Gap and Island Analysis

Ed Pollack covers a topic of importance for database developers:

Within a data set, an island of data is any ordered sequence where each row is in close proximity to the rows around it. For some data types and analysis, “close proximity” will mean consecutive. Dates, integers, and letters of the alphabet can be ordered sequentially where two adjacent values will not be able to have additional values in between them.

For example, there are no dates between October 23rd and October 24th. Similarly, there are no integers between 17 and 18 and no English letters between E and F. For these examples, an island of data could be defined as a sequence of consecutive values. A gap can be defined as a sequence of missing values.

There are a lot of difficult problems which gap & island analysis makes much easier by pivoting the way you think about the problem.

Comments closed

Short Substrings and Computed Columns

Erik Darling gives us a story about computed columns that turns out not to be about computed columns at all (having thereby subverted our expectations):

The problem is that when I tried to index it:

CREATE INDEX dummy
    ON dbo.Users(DisplayNameComputed);

I got this error:

Msg 537, Level 16, State 3, Line 21
Invalid length parameter passed to the LEFT or SUBSTRING function.

And when I tried to select data from the table, the same error.

Click through to find the real query killer.

Comments closed

Deferred Compilation and Compatibility Level 140 Query Hints

Milos Radivojevic shows that table-valued parameters do not care about your QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140 settings:

The plan is created under CL 140, but the estimation number of rows for the table variable is not 1 but the actual one. Even if you would specify the FORCE_LEGACY_CARDINALITY_ESTIMATION hint, the query will be deferred compiled and behavior of table variable would be the same.

Table variable deferred compilation respects settings at the database scope and ignores all hints except one.

Click through to learn what that one hint is.

Comments closed