Press "Enter" to skip to content

Category: T-SQL

COUNT And NULL Values

Kenneth Fisher digs into the COUNT() function and sees how it deals with NULL values:

Count the number of values
1
2
3
SELECT COUNT(FieldName) FROM TableName;
-- or
SELECT COUNT(ALL FieldName) FROM TableName;

The ALL argument is the default and is unnecessary (I didn’t even know it existed until I started this post). Here you are counting the number of non NULL values in FieldName. So in a column with (1, NULL, 1, 2, 3, NULL, 1) you’ll get a count of 5. You do get a nice warning (depending on your ANSI_WARNINGS setting) if there was a NULL value though.

By the way, the ALL operator isn’t useful there, but can be useful along with its counterparts SOME and ANY.  I rarely keep them in my mind, so I instead tend to write EXISTS and NOT EXISTS statements which have operate on an equivalent function.

Comments closed

HASHBYTES On FOR JSON PATH Data

Greg Low walks us through a mechanism to check whether data has changed:

In a previous post, I wrote about how to determine if a set of incoming values for a row are different to all the existing values in the row, using T-SQL in SQL Server.

I later remembered that I’d seen a message by Adam Machanic a while back, talking about how FOR JSON PATH might be useful for this, so I did a little more playing around with it.

If you are using SQL Server 2016 or later, I suspect this is a really good option.

Click through to see an example using the WideWorldImporters database.

Comments closed

Parallelism Strategies For Grouping Operations

Itzik Ben-Gan continues his series on grouping data in SQL Server by looking at how these operations can go parallel:

Besides needing to choose between various grouping and aggregation strategies (preordered Stream Aggregate, Sort + Stream Aggregate, Hash Aggregate), SQL Server also needs to choose whether to go with a serial or a parallel plan. In fact, it can choose between multiple different parallelism strategies. SQL Server uses costing logic that results in optimization thresholds that under different conditions make one strategy preferred to the others. We’ve already discussed in depth the costing logic that SQL Server uses in serial plans in the previous parts of the series. In this section I’ll introduce a number of parallelism strategies that SQL Server can use for handling grouping and aggregation. Initially, I won’t get into the details of the costing logic, rather just describe the available options. Later in the article I’ll explain how the costing formulas work, and an important factor in those formulas called DOP for costing.

As you will later learn, SQL Server takes into account the number of logical CPUs in the machine in its costing formulas for parallel plans. In my examples, unless I say otherwise, I assume the target system has 8 logical CPUs. If you want to try out the examples that I’ll provide, in order to get the same plans and costing values like I do, you need to run the code on a machine with 8 logical CPUs as well. If you’re machine happens to have a different number of CPUs, you can emulate a machine with 8 CPUs—for costing purposes—like so:

DBCC OPTIMIZER_WHATIF(CPUs, 8);

Even though this tool is not officially documented and supported, it’s quite convenient for research and learning purposes.

This is a fairly long article, but a great one.

Comments closed

Grouping And Aggregating In SQL, R, And Python

Dejan Sarka has a few examples of aggregation in different languages, including SQL, R, and Python:

The query calculates the coefficient of variation (defined as the standard deviation divided the mean) for the following groups, in the order as they are listed in the GROUPING SETS clause:

  • Country and education – expression (g.EnglishCountryRegionName, c.EnglishEducation)
  • Country only – expression (g.EnglishCountryRegionName)
  • Education only – expression (c.EnglishEducation)
  • Over all dataset- expression ()

Note also the usage of the GROUPING() function in the query. This function tells you whether the NULL in a cell comes because there were NULLs in the source data and this means a group NULL, or there is a NULL in the cell because this is a hyper aggregate. For example, NULL in the Education column where the value of the GROUPING(Education) equals to 1 indicates that this is aggregated in such a way that education makes no sense in the context, for example aggregated over countries only, or over the whole dataset. I used ordering by NEWID() just to shuffle the results. I executed query multiple times before I got the desired order where all possibilities for the GROUPING() function output were included in the first few rows of the result set. Here is the result.

GROUPING SETS is an underappreciated bit of SQL syntax.

Comments closed

sp_ctrl3 — Getting Information On Database Objects

Daniel Hutmacher shows off his sp_ctrl3 solution:

My absolute number one favorite homebrew tool is without a doubt sp_ctrl3. I started building it a long time ago to replace the built-in sp_help procedure in SQL Server, which is accessible using the Alt+F1 shortcut in Management Studio.

sp_help shows you object information on database objects, such as column definitions, parameters, indexes, etc, but it’s old (I remember it in SQL Server 2000, but it’s probably way older than that) and it hasn’t really aged that well since then. What’s more, sp_help won’t show you the more technical details and features from newer versions (like filters and included columns on indexes) , so you can’t really just copy and paste information from it very reliably or effortlessly.

Like the name implies, cp_ctrl3 aims to address some of those issues, and years later, I find myself adding features to it whenever there’s something I miss.

Check it out as a worthy replacement to sp_help.

Comments closed

Exploding Rows With Tally Tables

Riley Major shows how to use a tally table to turn one row into several rows:

Imagine a simple table containing contact information. It has two places for phone numbers (Phone1 and Phone2). Let’s say you wanted to split those off into a related phone numbers table so you could support many more numbers for a single contact. As part of that process, you’d need to create two rows for each existing contact– one for each phone number.

One way to get two phone numbers is to handle each phone number as a separate process, combining a list of all contacts and their first number with those same contacts and their second number. So you’d SELECT everything from the table once for the first phone number and then UNION it with another SELECT. This works, but if you have a large list of columns or a complex set of JOINs, you end up repeating a lot of code.

Instead, what if you simply, magically double the rows and then pick and choose the columns you wanted from each of the rows?

Tally tables are a great “Get out of a bad design (relatively) cheap” tool.  They’re not something I use on a daily basis, but they’ve made life easier for me on dozens of occasions.

Comments closed

Retrieving Statistic Use From Query Plan XML

Lonny Niederstadt shows us how to retrieve stats usage details from SQL Server query plans if trace flag 8666 is enabled:

Years ago someone said “Hey – why not drop auto-created stats, since the stats you need will just get created again and you’ll end up getting rid of those you no longer need.”   That *may* be a reasonable step on some systems.  If the risk of bad plans on first execution of a query needed stats that have been dropped is too high, its a bad deal.  If the potential concurrent cost of auto-creating dropped stats is too high, that’s a bad deal.  What about analyzing query plans over some period of time to see which stats are actually used in those plans?  Then auto-stats which aren’t used in that set of plans could be dropped.

That type of stats analysis could have other uses, too.  Prioritizing stats manual stats updates in regular maintenance comes to mind.  Or, determining what stats to create/update on an Always On Availability Group primary based on secondary activity.  And troubleshooting problem queries or identifying suspicious “watchlist” stats based on highly variable queries/plans they are involved with.

So I created this blog post almost 4 years ago.  And now I’ll plead with you to not use the query there… it’s awful.  If you want to query trace flag 8666 style stats from plan XML, please start from the query in this post instead – its much more well behaved 🙂

Read on for the script.

Comments closed

Updating SQL Agent Job Owners With dbatools

Stuart Moore gives us two methods of updating SQL Agent job owners, one using T-SQL and the other with dbatools:

Now we all know that having SQL Server Agent jobs owned by ‘Real’ users isn’t a good idea. But I don’t keep that close an eye on some of our test instances, so wasn’t surprised when I spotted this showing up in the monitoring:

The job failed. Unable to determine if the owner (OldDeveloper) of job important_server_job has server access (reason: Could not obtain information about Windows NT group/user 'OldDeveloper', error code 0x534. [SQLSTATE 42000] (Error 15404)).

Wanting to fix this as quickly and simply as possible I just wanted to bulk move them to our job owning account (let’s use the imaginative name of ‘JobOwner’).

Click through for both scripts.

Comments closed

Performance Tuning Window Functions

Kathi Kellenberger gives us some hints on tuning queries using window functions:

The only way to overcome the performance impact of sorting is to create an index specifically for the OVER clause. In his book Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions, Itzik Ben-Gan recommends the POC index. POC stands for (P)ARTITION BY, (O)RDER BY, and (c)overing. He recommends adding any columns used for filtering before the PARTITION BY and ORDER BY columns in the key. Then add any additional columns needed to create a covering index as included columns. Just like anything else, you will need to test to see how such an index impacts your query and overall workload. Of course, you cannot add an index for every query that you write, but if the performance of a particular query that uses a window function is important, you can try out this advice.

There are some good insights here.

Comments closed

Multiple SYSDATETIME In The Same SELECT May Give Unexpected Results

Louis Davidson walks through a scenario he experienced:

The data is exactly as expected, even though the other two calls would have returned .902 and .903 if simply rounded off. On the other hand, looking for differences between the time1_3 and time2_3 columns:

Returns 133 rows. With the sysdatetime values being exactly the same:

But the other columns, are incorrect for our needs, as the values are the same:

This was the bug we experienced! Instead of being 1 millisecond different, the values were equal.

Louis’s moral to the story is to test your assumptions.  The more prosaic moral is that calls to get the current time take a non-zero amount of time.

Comments closed