Category: T-SQL

Simulating LAG And LEAD Prior To SQL Server 2012

Published 2018-10-12 by Kevin Feasel

Izik Ben-Gan highlights a reader submission from his last post:

Last month I covered a Special Islands challenge. The task was to identify periods of activity for each service ID, tolerating a gap of up to an input number of seconds (@allowedgap). The caveat was that the solution had to be pre-2012 compatible, so you couldn’t use functions like LAG and LEAD, or aggregate window functions with a frame. I got a number of very interesting solutions posted in the comments by Toby Ovod-Everett, Peter Larsson, and Kamil Kosno. Make sure to go over their solutions since they’re all quite creative.

Curiously, a number of the solutions ran slower with the recommended index than without it. In this article I propose an explanation for this.

Even though all solutions were interesting, here I wanted to focus on the solution by Kamil Kosno, who’s an ETL developer with Zopa. In his solution, Kamil used a very creative technique to emulate LAG and LEAD without LAG and LEAD. You will probably find the technique handy if you need to perform LAG/LEAD-like calculations using code that is pre-2012 compatible.

Kamil’s solution was quite clever.

Comments closed

New Use Hint In SQL Server 2017 CU10

Published 2018-09-25 by Kevin Feasel

Pedro Lopes shows us a new use hint introduced in SQL Server 2017 CU10:

In this scenario, you only have this one query that apparently does better in SQL Server 2014 than 2017. That’s all “New CE” – there’s no CE70 vs CE 120+ at issue here. Using any known trace flag, the FORCE_LEGACY_CARDINALITY_ESTIMATION hint or the FORCE_DEFAULT_CARDINALITY_ESTIMATION hint doesn’t help. Rewriting the query is an option, but in the interim, I need a quick fix. How?

In SQL Server 2017 CU10, we have introduced a few new USE HINTs: the QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_n, where n is a supported database compatibility level. This forces the query optimizer behavior at a query level, as if the query was compiled with database compatibility level. You can refer to sys.dm_exec_valid_use_hints for a list of currently supported values for n.

So to be clear, the new hint is not forcing only a specific CE model, it’s forcing the equivalent of the specific database compatibility level’s query optimizer behavior, including any query optimizer fixes that are enabled by default in that database compatibility level.

Something to keep in mind, though ideally not something you’d want to use regularly.

Comments closed

What’s In SQL Server 2019 CTP 2.0?

Published 2018-09-25 by Kevin Feasel

Aaron Bertrand gives us the highlights:

Certificate Management in Config Manager View and validate all of your certificates from a single interface, and manage and deploy certificate changes across all of the replicas in an Availability Group or all of the nodes in a Failover Cluster Instance.
Built-in data classification A new ADD SENSITIVITY CLASSIFICATION statement helps you identify and automatically audit sensitive data, a huge step up from the previous SSMS wizard (which just used extended properties).

Aaron also digs into the engine a bit:

APPROX_COUNT_DISTINCT

This new aggregate function is designed for data warehouse scenarios, and is an equivalent for COUNT(DISTINCT()). Instead of performing expensive distinct sort operations to determine actual counts, it relies instead on statistics to get something relatively accurate. You should find that the margin of error is within 2% of the precise count, 97% of the time, which is usually fine for high-level analytics, values that populate a dashboard, or quick estimates.

On my system I created a table with integer columns ranging from 100 to 1,000,000 unique values, and string columns ranging from 100 to 100,000 unique values. There were no indexes other than a clustered primary key on the leading integer column. Here are the results of COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT() against those columns, so you can see where it is off by a bit (but always well within 2%):

By the way, APPROX_COUNT_DISTINCT() is a really good idea, and I’m glad it’s here.

Comments closed

Dealing With Large JSON Values

Published 2018-09-20 by Kevin Feasel

Bert Wagner investigates an issue he found where his long JSON strings were becoming NULL in SQL Server:

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters. Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, ‘strict $.FiveThousandAs’) so at least I would be notified immediately of an problem with my query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn’t help me extract all of the data from my JSON property.

Read on for the answer.

Comments closed

Deleting Top Records With An Order By Clause

Published 2018-09-20 by Kevin Feasel

Kenneth Fisher shows that deleting the top N records with an ORDER BY clause is not straightforward:

Did you know you can’t do this?
DELETE TOP (10)
FROM SalesOrderDetail
ORDER BY SalesOrderID DESC;
Msg 156, Level 15, State 1, Line 8
Incorrect syntax near the keyword ‘ORDER’.

I didn’t. Until I tried it anyway. Turns out, it says so right in the limitations section of BOL. Fortunately, that same section does have a recommendation on how to move forward.

Read on for a couple of methods to do this.

Comments closed

Gaps And Islands: Solving Stochastic Islands Problems

Published 2018-09-18 by Kevin Feasel

Itzik Ben-Gan shares with us a special case of the islands problem:

In your database you keep track of services your company supports in a table called CompanyServices, and each service normally reports about once a minute that it’s online in a table called EventLog. The following code creates these tables and populates them with small sets of sample data:

[…]

The special islands task is to identify the availability periods (serviced, starttime, endtime). One catch is that there’s no assurance that a service will report that it’s online exactly every minute; you’re supposed to tolerate an interval of up to, say, 66 seconds from the previous log entry and still consider it part of the same availability period (island). Beyond 66 seconds, the new log entry starts a new availability period. So, for the input sample data above, your solution is supposed to return the following result set (not necessarily in this order):

It’s a neat twist on an old problem.

Comments closed

Ad Hoc Functions In T-SQL

Published 2018-09-17 by Kevin Feasel

Riley Major shows a couple techniques for including ad hoc functions in T-SQL, namely Common Table Expressions and the APPLY operator:

It’s helpful to think of each APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.

It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.

This is one of my favorite uses of the APPLY operator, as it lets you think through a problem step-by-step while still allowing the optimizer to create a set-based solution for you.

Comments closed

Using Table-Valued Parameters With sp_executesql

Published 2018-09-17 by Kevin Feasel

Kenneth Fisher shows how to include table-valued parameters in a dynamic SQL query:

Recently I did a presentation on dynamic SQL. In the presentation I pointed out the similarity of using sp_executesql to creating a stored procedure to do the same task. After the session I was asked: If that’s the case, can I pass a TVP (table valued parameter) into sp_executesql?

Awesome question! Let’s give it a shot.

Read on to see how to do this.

Comments closed

Multiple Mentions Of A Column In An UPDATE Statement

Published 2018-09-14 by Kevin Feasel

Doug Lane walks us through various scenarios with updates including the same column multiple times:

An application developer came to me with this question recently: “Can I use the same column twice in a SQL UPDATE statement?”

Yes and no.

It depends on what you mean by “use”.

Read on to see what Doug means.

Comments closed

Issues With Bulk Inserting Multi-Byte Characters In Fixed Width Files

Published 2018-09-13 by Kevin Feasel

Randolph West shares an example of an issue with BULK INSERT:

Fellow Canadian Doran Douglas brought this issue to my attention recently, and I wanted to share it with you as well.

Let’s say you have a file in UTF-8 format. What this means is that some of the characters will be single-byte, and some may be more than that.

Where this becomes problematic is that a fixed-width file has fields that are, well, fixed in size. If a Unicode character requires more than one byte, it’s going to cry havoc and let slip the dogs of truncation.

Click through for an example. This seems like a bug to me—I interpret fixed-width as fixed number of characters, not fixed number of bytes. At the very least, it’s liable to cause confusion.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31