Category: Syntax

Lets start off with what is a table-valued function (TVF)? A TVF is a dynamic table produced at the time of execution, depending on parameters. Like a view, a TVF creates a result set only when it’s executed, but, unlike a view, it can be parameterized.

You get two types of TVFs, an Inline Table-Valued Function (ITVFs) and Multi-statement Table-Valued Function (MTVFs). I find them easy to remember, think of the “I” in ITVF as 1 (single statement) and the “M” in MTVF as “many” (multiple statements).

As you can imagine, a TVF produces a result set that can be used as a virtual table or view. Yes, you can actually select data from a TVF, or join it with some other tables, views, or even other TVFs. In this post I will go into more detail on ITVFs.

In my experience, the best part about using inline table-valued function is the 3-9x performance improvement you get when removing them and inlining their code. It’s a great idea but the performance cost is just too high for me.

Comments closed

COUNT And NULL Values

Published 2018-07-17 by Kevin Feasel

Kenneth Fisher digs into the COUNT() function and sees how it deals with NULL values:

Count the number of values

1

2

3

SELECT COUNT(FieldName) FROM TableName;

-- or

SELECT COUNT(ALL FieldName) FROM TableName;

The ALL argument is the default and is unnecessary (I didn’t even know it existed until I started this post). Here you are counting the number of non NULL values in FieldName. So in a column with (1, NULL, 1, 2, 3, NULL, 1) you’ll get a count of 5. You do get a nice warning (depending on your ANSI_WARNINGS setting) if there was a NULL value though.

By the way, the ALL operator isn’t useful there, but can be useful along with its counterparts SOME and ANY. I rarely keep them in my mind, so I instead tend to write EXISTS and NOT EXISTS statements which have operate on an equivalent function.

Comments closed

Grouping And Aggregating In SQL, R, And Python

Published 2018-07-10 by Kevin Feasel

Dejan Sarka has a few examples of aggregation in different languages, including SQL, R, and Python:

The query calculates the coefficient of variation (defined as the standard deviation divided the mean) for the following groups, in the order as they are listed in the GROUPING SETS clause:

Country and education – expression (g.EnglishCountryRegionName, c.EnglishEducation)

Country only – expression (g.EnglishCountryRegionName)

Education only – expression (c.EnglishEducation)

Over all dataset- expression ()

Note also the usage of the GROUPING() function in the query. This function tells you whether the NULL in a cell comes because there were NULLs in the source data and this means a group NULL, or there is a NULL in the cell because this is a hyper aggregate. For example, NULL in the Education column where the value of the GROUPING(Education) equals to 1 indicates that this is aggregated in such a way that education makes no sense in the context, for example aggregated over countries only, or over the whole dataset. I used ordering by NEWID() just to shuffle the results. I executed query multiple times before I got the desired order where all possibilities for the GROUPING() function output were included in the first few rows of the result set. Here is the result.

GROUPING SETS is an underappreciated bit of SQL syntax.

Comments closed

Docker Commands For DBAs

Published 2018-06-28 by Kevin Feasel

Marek Masko has some commands which will help DBAs get familiar with Docker:

Stop container

To stop container:

docker stop <container ID>

To stop all running containers:

docker stop $(docker ps -a -q)

Most of the commands are straightforward but it’s nice to have a reference guide.

Comments closed

OFFSET – FETCH Versus ROWNUM In Oracle

Published 2018-06-28 by Kevin Feasel

Lukas Eder compares the OFFSET FETCH logic versus using ROWNUM for grabbing an ordered sub-selection of rows in Oracle:

Now, while the SQL transformation from FETCH FIRST to ROW_NUMBER() filtering is certainly correct, the execution plan doesn’t really make me happy. Consider the ROWNUM based query:

---------------------------------------------------------
| Id  | Operation                     | Name    | Rows  |
---------------------------------------------------------
|   0 | SELECT STATEMENT              |         |       |
|*  1 |  COUNT STOPKEY                |         |       |
|   2 |   VIEW                        |         |     1 |
|   3 |    TABLE ACCESS BY INDEX ROWID| FILM    |  1000 |
|   4 |     INDEX FULL SCAN           | PK_FILM |     1 |
---------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(ROWNUM=1)

And compare that to the FETCH FIRST query:

-------------------------------------------------
| Id  | Operation                | Name | Rows  |
-------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |
|*  1 |  VIEW                    |      |     1 |
|*  2 |   WINDOW SORT PUSHED RANK|      |  1000 |
|   3 |    TABLE ACCESS FULL     | FILM |  1000 |
-------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "FILM"."FILM_ID")<=1)

Lukas digs into this and is not the biggest fan of OFFSET-FETCH. On the SQL Server side, my anecdotal experience has been that it doesn’t perform nearly as well as you’d like either.

Comments closed

Strings And Identifiers

Published 2018-06-21 by Kevin Feasel

Kenneth Fisher explains the difference between a string and an identifier:

A common mistake, and one I make frequently myself is to use a string in place of an identifier, or vise-versa. So to start, let’s have some definitions, shall we?

String

a linear sequence of characters, words, or other data.

Identifier

a sequence of characters used to identify or refer to a program or an element, such as a variable or a set of data, within it.

And because I always find examples fairly useful.

Click through for the example as well as additional explanation.

Comments closed

Identity Columns And Linked Servers

Published 2018-06-15 by Kevin Feasel

Kenneth Fisher points out an oddity when inserting data across a linked server into a table with an identity column:

So far so good. Now let’s throw in a twist. Let’s call it through a linked server.
INSERT INTO [(local)\sql2014cs].Test.dbo.IdentTest 
	VALUES ('Col1','Col2');
Msg 213, Level 16, State 1, Line 4
Column name or number of supplied values does not match table definition.

Well that’s a bit odd, right? I mean I used that exact command in the previous test. Turns out that when you do an insert across a linked server that identity column is not ignored. Which means we just need to include the identity value right? Nope.
INSERT INTO [(local)\sql2014cs].Test.dbo.IdentTest 
	VALUES (1,'Col1','Col2');
Msg 7344, Level 16, State 1, Line 4
The OLE DB provider “SQLNCLI11” for linked server “(local)\sql2014cs” could not INSERT INTO table “[(local)\sql2014cs].[Test].[dbo].[IdentTest]” because of column “Id”. The user did not have permission to write to the column.

Click through to see how to do this.

Comments closed

There’s Only One Way To Order

Published 2018-05-31 by Kevin Feasel

Matthew McGiffen notes that there is only one way to order, and that is to use the ORDER BY clause:

Everyone, at the beginning of their SQL career, get’s told that it is important to include an ORDER BY if they want the results ordered. Otherwise the order in which they are returned is not guaranteed.

But then you run queries a lot of times that don’t need a specific order – and you see that they (at least seem to) come out in the same order every time. You could (almost) be forgiven for thinking you can rely on that.

There was even a question on a Microsoft SQL certification exam a few years ago that asked what the default order was for records returned by a simple SELECT – the answer it was looking for was that it would be according to the order of the clustered index. So you can rely on that – right?

Wrong. The question was a bad question, and the answer was incorrect. Let’s look at this in action.

Order is never guaranteed to be stable unless you specify a unique ordering using ORDER BY.

Comments closed

Grouping By Nothing In SQL

Published 2018-05-30 by Kevin Feasel

Lukas Eder points out a subtlety of the GROUP BY clause:

SELECT count(*)
FROM film
GROUP BY ()
This will yield:
count |
------|
1000  |
What’s the point, you’re asking? Can’t we just omit the GROUP BY clause? Of course, this will yield the same result:
SELECT count(*)
FROM film
Yet, the two versions of the query are subtly different.

Great post and also shows a case when GROUP BY () isn’t supported.

Comments closed

Deferred Name Resolution In SQL Server

Published 2018-05-23 by Kevin Feasel

Kendra Little explains the concept of deferred name resolution in SQL Server:

In this case, I’m creating a temporary stored procedure (out of laziness, it means I don’t have to clean up a quick demo) –
CREATE OR ALTER PROCEDURE #test
AS
IF 1=0
    EXECUTE dbdoesnotexist.dbo.someproc;
GO
The database dbdoesnotexist does NOT exist, but I’m still allowed to create the procedure.

When I do so, I get an informational message:

The module ‘#test’ depends on the missing object ‘dbdoesnotexist.dbo.someproc’. The module will still be created; however, it cannot run successfully until the object exists.

This can be useful in some cases where you’ll be querying a table or procedure that may not exist all the time, but which will exist when a certain code block is run.

But, as Kendra points out, deferred name resolution doesn’t work everywhere, so it’s important to know the rules around when it will or will not work.

Comments closed