Press "Enter" to skip to content

Category: T-SQL

Useful Azure SQL Database T-SQL Statements

Arun Sirpal has a few T-SQL statements to help manage an Azure SQL Database database:

Creating a database

Very easy to create something like the below – a S2 database.

CREATE DATABASE [MeeTwoDB]
( MAXSIZE = 5GB, EDITION = 'standard', SERVICE_OBJECTIVE = 'S2' ) ;
--checking details
SELECT Edition = DATABASEPROPERTYEX('MeeTwoDB', 'Edition'), ServiceObjective = DATABASEPROPERTYEX('MeeTwoDB', 'ServiceObjective')

Read on for several more examples.

Comments closed

Views Don’t Improve Performance

Grant Fritchey lays down the law on views:

One day, it’s going to happen. I’m going to hear some crazy theory about how SQL Server works and I’m going to literally explode. Instead of some long silly rant with oddball literary & pop culture references you’ll get a screed the size of Ulysses (and about as much fun to read). However, for the moment, like Robin Williams describing a dance move, I’m going to keep it all inside. Here’s our query:

No, no where clause because we have to compare this to this, our view:

Grant used up much of his strategic reserve of GIFs in that post, so check it out.

Comments closed

Deleting A Small Number Of Rows From A Big Table

Brent Ozar has a tip for deleting a relatively small percentage of data from a very large table:

Say you’ve got a table with millions or billions of rows, and you need to delete some rows. Deleting ALL of them is fast and easy – just do TRUNCATE TABLE – but things get much harder when you need to delete a small percentage of them, say 5%.

It’s especially painful if you need to do regular archiving jobs, like deleting the oldest 30 days of data from a table with 10 years of data in it.

The trick is making a view that contains the top, say, 1,000 rows that you want to delete:

Read on for a demo.

Comments closed

Scalar Function Blocking

Erik Darling notes that scalar functions can cause multi-table blocking:

Someone had tried to be clever. Looking at the code running, if you’ve been practicing SQL Server for a while, usually means one thing.

A Scalar Valued Function was running!

In this case, here’s what it looked like:

Someone had added that function as a computed column to the Users table:

Spoilers:  this was a bad idea.

Comments closed

Recently Added String Functions

Lori Brown covers a few string functions added to SQL Server in the past two versions:

STRING_ESCAPE (https://docs.microsoft.com/en-us/sql/t-sql/functions/string-escape-transact-sql)

This function is available starting with SQL 2016 and is currently only able to escape JSON characters. To me it’s not super useful just yet but hopefully they will add more types soon.

I haven’t had the need to use STRING_ESCAPE yet, but one additional function I’d add is CONCAT_WS.

Comments closed

Finding Overlapping Data Ranges

Louis Davidson shows how to find groups of data which overlap:

This week, I had a problem where I needed to find and eliminate from the results of my query, data with overlapping ranges. I have written about this topic before, in my database design book book, in regards to building a trigger to avoid overlapping ranges. But even though I have written on the topic there, I still use Google just like you to get quick help (I use books when I want to learn, or expand my knowledge on a topic in depth, blogs when I need a simple answer to a simple or complex question.)

The problem at hand is most often associated with date based data, such as effective dates for a row in a temporal/type 2 dimension table, or other cases like appointment times, etc. But the algorithm is the same with numbers and is a bit easier to read since we don’t have the same issues with roundoff and decimal places (the query is complex enough on its own to show in a blog post). From a progression of start and end values in each row, we are going to look at how to check to make sure that there are no two rows that are in conflict (no range should contain another ranges value at all).

This feels like the type of thing which could be rewritten with window function to be a little smoother, but I’d have to think about it more.  Louis does provide a good solution and explanation to a fairly common but tricky problem in T-SQL.

Comments closed

Reverse Engineering The Stream Aggregate Algorithm

Itzik Ben-Gan has started a series of articles on optimizing queries which use grouping and aggregating with a reverse-engineering of the stream aggregate algorithm:

As you may already know, when SQL Server optimizes a query, it evaluates multiple candidate plans, and eventually picks the one with the lowest estimated cost. The estimated plan cost is the sum of all the operators’ estimated costs. In turn, each operator’s estimated cost is the sum of the estimated I/O cost and estimated CPU cost. The cost unit is meaningless in its own right. Its relevance is in the comparison that the optimizer makes between candidate plans. That is, the costing formulas were designed with the goal that, between candidate plans, the one with the lowest cost will (hopefully) represent the one that will finish more quickly. A terribly complex task to do accurately!

The more the costing formulas adequately take into account the factors that truly affect the algorithm’s performance and scaling, the more accurate they are, and the more likely that given accurate cardinality estimates, the optimizer will choose the optimal plan. At any rate, if you want to understand why the optimizer chooses one algorithm versus another you need to understand two main things: one is how the algorithms work and scale, and another is SQL Server’s costing model.

So back to the plan in Figure 1; let’s try and understand how the costs are computed. As a policy, Microsoft will not reveal the internal costing formulas that they use. When I was a kid I was fascinated with taking things apart. Watches, radios, cassette tapes (yes, I’m that old), you name it. I wanted to know how things were made. Similarly, I see value in reverse engineering the formulas since if I manage to predict the cost reasonably accurately, it probably means that I understand the algorithm well. During the process you get to learn a lot.

Our query ingests 1,000,000 rows. Even with this number of rows, the I/O cost seems to be negligible compared to the CPU cost, so it is probably safe to ignore it.

As for the CPU cost, you want to try and figure out which factors affect it and in what way.

I give this my highest recommendation.

Comments closed

Query Tuning With The APPLY Operator

Daniel Janik walks through using the APPLY operator to tune a couple of queries:

Recently we were doing a project that heavily focused on query tuning and many tables had various outer joins. My co-worker pointed out that many of these could be converted to an apply rather than a join.

Apply gives you both CROSS and OUTER. Think of CROSS APPLY like an INNER JOIN and OUTER APPLY like an OUTER JOIN.

Let’s compare some code to see how APPLY stacks up.

I like the APPLY operator so much that I created an entire presentation on it.  It’s not a cure-all by any means, but if you understand the intent, you can find places where it improves your code significantly.

Comments closed

Bucketing Tables By Size

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Comments closed