Category: T-SQL

Recently, I need to do a data warehouse migration for a client. Since there might be some difference between the Dev environment source databases and Prod environment source databases. The migrated SSIS packages for building data warehouse might have some failures because of the changes. So the challenge is how can I validate all my DW packages (100 +) all at once.

Click through for the script.

Comments closed

APPROX_COUNT_DISTINCT

Published 2018-10-15 by Kevin Feasel

Niko Neugebauer is happy with a new function in SQL Server 2019:

A rather interesting result takes place if we scale our database to 100GB TPCH and run the very same queries – the total elapsed time jumps to 50% difference (from 30%), the CPU execution time difference is kept at 50%, but the memory grant gives the biggest difference ever – those 24.476 MB are still intact for the APPROX_DISTINCT_COUNT, while the COUNT(DISTINCT) asks for just a bit over 11GB ! Besides going through a completely different gateway on the bigger machines, running COUNT(DISTINCT) will bring your system to a full stop way before the same will take place with the APPROX_DISTINCT_COUNT.
Regarding the precision – in my tests I did not see the difference going over 1%.

Test before using this function, but if you don’t the correct number and can make do with “close enough,” this can save a boatload of memory on larger tables.

Comments closed

Simulating LAG And LEAD Prior To SQL Server 2012

Published 2018-10-12 by Kevin Feasel

Izik Ben-Gan highlights a reader submission from his last post:

Last month I covered a Special Islands challenge. The task was to identify periods of activity for each service ID, tolerating a gap of up to an input number of seconds (@allowedgap). The caveat was that the solution had to be pre-2012 compatible, so you couldn’t use functions like LAG and LEAD, or aggregate window functions with a frame. I got a number of very interesting solutions posted in the comments by Toby Ovod-Everett, Peter Larsson, and Kamil Kosno. Make sure to go over their solutions since they’re all quite creative.

Curiously, a number of the solutions ran slower with the recommended index than without it. In this article I propose an explanation for this.

Even though all solutions were interesting, here I wanted to focus on the solution by Kamil Kosno, who’s an ETL developer with Zopa. In his solution, Kamil used a very creative technique to emulate LAG and LEAD without LAG and LEAD. You will probably find the technique handy if you need to perform LAG/LEAD-like calculations using code that is pre-2012 compatible.

Kamil’s solution was quite clever.

Comments closed

New Use Hint In SQL Server 2017 CU10

Published 2018-09-25 by Kevin Feasel

Pedro Lopes shows us a new use hint introduced in SQL Server 2017 CU10:

In this scenario, you only have this one query that apparently does better in SQL Server 2014 than 2017. That’s all “New CE” – there’s no CE70 vs CE 120+ at issue here. Using any known trace flag, the FORCE_LEGACY_CARDINALITY_ESTIMATION hint or the FORCE_DEFAULT_CARDINALITY_ESTIMATION hint doesn’t help. Rewriting the query is an option, but in the interim, I need a quick fix. How?

In SQL Server 2017 CU10, we have introduced a few new USE HINTs: the QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_n, where n is a supported database compatibility level. This forces the query optimizer behavior at a query level, as if the query was compiled with database compatibility level. You can refer to sys.dm_exec_valid_use_hints for a list of currently supported values for n.

So to be clear, the new hint is not forcing only a specific CE model, it’s forcing the equivalent of the specific database compatibility level’s query optimizer behavior, including any query optimizer fixes that are enabled by default in that database compatibility level.

Something to keep in mind, though ideally not something you’d want to use regularly.

Comments closed

What’s In SQL Server 2019 CTP 2.0?

Published 2018-09-25 by Kevin Feasel

Aaron Bertrand gives us the highlights:

Certificate Management in Config Manager View and validate all of your certificates from a single interface, and manage and deploy certificate changes across all of the replicas in an Availability Group or all of the nodes in a Failover Cluster Instance.
Built-in data classification A new ADD SENSITIVITY CLASSIFICATION statement helps you identify and automatically audit sensitive data, a huge step up from the previous SSMS wizard (which just used extended properties).

Aaron also digs into the engine a bit:

APPROX_COUNT_DISTINCT

This new aggregate function is designed for data warehouse scenarios, and is an equivalent for COUNT(DISTINCT()). Instead of performing expensive distinct sort operations to determine actual counts, it relies instead on statistics to get something relatively accurate. You should find that the margin of error is within 2% of the precise count, 97% of the time, which is usually fine for high-level analytics, values that populate a dashboard, or quick estimates.

On my system I created a table with integer columns ranging from 100 to 1,000,000 unique values, and string columns ranging from 100 to 100,000 unique values. There were no indexes other than a clustered primary key on the leading integer column. Here are the results of COUNT(DISTINCT()) vs. APPROX_COUNT_DISTINCT() against those columns, so you can see where it is off by a bit (but always well within 2%):

By the way, APPROX_COUNT_DISTINCT() is a really good idea, and I’m glad it’s here.

Comments closed

Dealing With Large JSON Values

Published 2018-09-20 by Kevin Feasel

Bert Wagner investigates an issue he found where his long JSON strings were becoming NULL in SQL Server:

After a little bit more research, I discovered that the return type for JSON_VALUE is limited to 4000 characters. Since JSON_VALUE is in lax mode by default, if the output has more than 4000 characters, it fails silently.

To force an error in future code I could use SELECT JSON_VALUE(@json, ‘strict $.FiveThousandAs’) so at least I would be notified immediately of an problem with my query/data (via failure).

Although strict mode will notify me of issues sooner, it still doesn’t help me extract all of the data from my JSON property.

Read on for the answer.

Comments closed

Deleting Top Records With An Order By Clause

Published 2018-09-20 by Kevin Feasel

Kenneth Fisher shows that deleting the top N records with an ORDER BY clause is not straightforward:

Did you know you can’t do this?
DELETE TOP (10)
FROM SalesOrderDetail
ORDER BY SalesOrderID DESC;
Msg 156, Level 15, State 1, Line 8
Incorrect syntax near the keyword ‘ORDER’.

I didn’t. Until I tried it anyway. Turns out, it says so right in the limitations section of BOL. Fortunately, that same section does have a recommendation on how to move forward.

Read on for a couple of methods to do this.

Comments closed

Gaps And Islands: Solving Stochastic Islands Problems

Published 2018-09-18 by Kevin Feasel

Itzik Ben-Gan shares with us a special case of the islands problem:

In your database you keep track of services your company supports in a table called CompanyServices, and each service normally reports about once a minute that it’s online in a table called EventLog. The following code creates these tables and populates them with small sets of sample data:

[…]

The special islands task is to identify the availability periods (serviced, starttime, endtime). One catch is that there’s no assurance that a service will report that it’s online exactly every minute; you’re supposed to tolerate an interval of up to, say, 66 seconds from the previous log entry and still consider it part of the same availability period (island). Beyond 66 seconds, the new log entry starts a new availability period. So, for the input sample data above, your solution is supposed to return the following result set (not necessarily in this order):

It’s a neat twist on an old problem.

Comments closed

Ad Hoc Functions In T-SQL

Published 2018-09-17 by Kevin Feasel

Riley Major shows a couple techniques for including ad hoc functions in T-SQL, namely Common Table Expressions and the APPLY operator:

It’s helpful to think of each APPLY as a pipe operation, taking the values from the previous derived table and passing them into the next to be manipulated. Programming T-SQL in this manner (loosely) approximates modern functional programming techniques.

It keeps each step of the logic smaller, so that it’s easier to understand. And you can expose the intermediary columns to help with debugging.

This is one of my favorite uses of the APPLY operator, as it lets you think through a problem step-by-step while still allowing the optimizer to create a set-based solution for you.

Comments closed

Using Table-Valued Parameters With sp_executesql

Published 2018-09-17 by Kevin Feasel

Kenneth Fisher shows how to include table-valued parameters in a dynamic SQL query:

Recently I did a presentation on dynamic SQL. In the presentation I pointed out the similarity of using sp_executesql to creating a stored procedure to do the same task. After the session I was asked: If that’s the case, can I pass a TVP (table valued parameter) into sp_executesql?

Awesome question! Let’s give it a shot.

Read on to see how to do this.

Comments closed