Author: Kevin Feasel

Performance Gains with APPLY

Published 2021-03-15 by Kevin Feasel

Erik Darling gives us a scenario where OUTER APPLY is quite useful:

In all, this query will run for about 18 seconds. The majority of it is spent in a bad neighborhood.
Why does this suck? Boy oh boy. Where do we start?
– Sorting the Votes table to support a Merge Join?
– Choosing Parallel Merge Joins ever?
– Choosing a Many To Many Merge Join ever?
– All of the above?

I voted “all of the above.” Click through to see how Erik turns a bad query plan into a much less bad query plan.

Comments closed

Bad Request when Debugging an Azure Data Factory Pipeline

Published 2021-03-15 by Kevin Feasel

Ed Elliott ran into a problem:

Now, whenever I am troublehooting something in Azure and I come to the activity logs I am always hopeful but also always dissapointed that they don’t show more details. The bit that really annoys me is that I know Micrsoft see more detailed error information as I have been screen sharing with a support tech who used log exporer to see more detailed error messages than I see – grrrr, just show us the data! Anyway, I digress – so in the activity log, does it give a clue as to what is wrong?
No, in a word no it doesn’t.

Read on for the conclusion, which rates as “Should have been an easy fix but the error message was completely unhelpful.”

Comments closed

Keeping msdb Clean

Published 2021-03-15 by Kevin Feasel

Eitan Blumin takes the data janitor role seriously:

As part of its regular, ongoing, day-to-day activities, your SQL Server instance would naturally collect historical data about its automated operations. If left unchecked, this historical data could pile up, leading to wasted storage space, performance hits, and even worse issues.
MSDB would obviously be collecting data about the SQL Agent job executions. But there are also a few other types of historical data that needs to be cleaned up once in a while. In this blog post, I hope to cover all bases and leave no historical data un-cleaned.

Read on for several data sources which you’ll want to keep tidy.

Comments closed

The Importance of LSNs to SQL Server

Published 2021-03-15 by Kevin Feasel

Jack Vamvas explains a concept:

I was talking to an Auditor recently – who specialises in large Corporate Audits – and they asked me how would I prove a certain database which is backed up is actually restored to another server. One of the methods I described was using the Log Sequence Numbers (LSN).

Read on for an explanation of how they work and how you can use LSNs to solve that auditing issue.

Comments closed

Data Paging using Common Table Expressions

Published 2021-03-12 by Kevin Feasel

Steve Stedman takes us through one method of generating pages of data:

I can remember the first time that I worked on data paging code. I had to page through web site search results 20 at a time on a given web page. My task was to understand how it had been written and to do some bug fixing. After reviewing about 2,000 lines of code, and reviewing the seven different variables that were being used to know the current page, the next page, the previous page, the page size, the row at the top of the page, the row at the bottom of the page, and the number of pages, I finally understood what the code was intended to do. What a mess that was, but beyond the mess, the performance was horrible. The way the page worked was that based on the page you were on, all of the rows would be queried, then a loop would read through all of the rows before the current page, then loop through the rows on the current page displaying them on the page, and finally it would ignore the results after the current page. So page 1 was slow, page 2 was slower than page 1, page 3 was slower than page 2 and on and on.
Since that point I have implemented several different data paging algorithms myself, all better than the original implementation but none as elegant as the CTE way of doing data paging. I used to look at data paging as a painful task, but thanks to the SQL Server implementation of CTEs there is no more pain.

Steve also covers OFFSET and FETCH. This technique won’t be great with enormous data sets, but for moderate-sized data sets which query quickly, it works. This is one area which is quite painful, and the best (and wackiest) solution I’ve come up with in SQL Server when the initial query is quite expensive is to create tables with random names to store results and row numbers, populate a table the first time a query is run, and query that table on subsequent runs, using the RETURN value in a stored procedure to pass along the name of the table to access. Granted, that solution works best with static data and you’d want to have a method to clean up those tables after they’re no longer in use (like storing a list of those tables and their last access dates and times). So it’s a mess.

Comments closed

Fun with Disabling Joins

Published 2021-03-12 by Kevin Feasel

David Alcock gives us a “wouldn’t it be neat to see this?” scenario:

If you’ve ever seen my session on Guillotines and the Query Optimiser I demonstrated how we can use query hints to switch off certain transformation rules in a query. Transformation rules use a pattern substitution which essentially swaps one expression for another. In SQL Server terms the best example of this is a join as although we use the logical expressions such as left join or inner join the physical join type used in the execution plan operator is substituted to be something like a Hash Match or Nested Loop Join.
This means we can force the behaviour of an execution plan by disabling certain rules. Now I can’t think of any practical reason for this and remember we can also use join hints in our queries but this is different as it affects the entire optimisation process for a query and not just one join so please follow in a sandbox environment.

In case you haven’t seen David’s session, it’s available on YouTube.

Comments closed

Disliking User-Defined Data Types

Published 2021-03-12 by Kevin Feasel

Andy Levy has a bone to pick:

Here’s the thing – these types are really just aliases for native types in SQL Server, but more constrained. Constrain yourself to UDTs and you’ll have trouble right-sizing your fields. Let’s say you’ve got three data types for text data:
– myShortString (varchar(10))
– myString (varchar(256))
– myBigString (varchar(8000))
These lengths are not helping anyone. You can’t store email addresses or names in myShortString. But myString is probably way too much for that data. You’re going to waste memory because of how SQL Server estimates memory grants and your indexes will be bloated. Maybe you just need to create more UDTs to cover these situations. But that just compounds the other problems, doesn’t it?

Pushes glasses up the bridge of his nose. Teeeeechnically, an e-mail address may be up to 256 characters long, including a username of up to 64 characters (and maybe two angle brackets, depending on the host). So myString would actually be perfect. Steve Jones has a comment about 300, but that was probably the original standard of 320. Regardless, I realize how far beside the point that is, and Andy’s point is a good one, as well as the several others he makes in the post.

One quick note on defined types: they really do make a lot of sense in a domain-driven design, especially when working with functional programming languages. Defining a CustomerID as an int is fine, but if I know my customer IDs are natural numbers (1, 2, 3, …), 9 digits long, and do not contain the sequence 2345 (because my company considers this an unlucky number sequence), creating a CustomerID type which provides this sort of type checking is great because you keep the rules as close to the data as possible and ensure consistency. It’s also more restrictive than int, so you can cast back down to an int when you’re ready to interact with some remote system. So short answer, do this all day in F#, but not in the database.

Comments closed

Removing Characters Before or After a Delimiter in Power Query

Published 2021-03-12 by Kevin Feasel

Gilbert Quevauvilliers shows how to remove all characters either before or after a delimiter:

This is an example of what my column name looked like below.
Date[Yr-Mth]
What I wanted this to look like is Yr-Mth]
Below is the Syntax I used to get this working.

Click through to see how Gilbert was able to do this and subsequently how to get rid of the trailing bracket.

Comments closed

In Defense of Float

Published 2021-03-12 by Kevin Feasel

Hugo Kornelis levies a defense of floating point data types:

Let’s return to the database. Let’s figure out a way to store these numbers appropriately.
Could we use decimal (or its synonym numeric)? Well, yes. We can. We need 25 digits after the decimal place for 0.0000000000000000000004052, and 9 digits before the decimal place for 299900000, so that would fit in a decimal(34,25). But if you try to compute c² so you can then multiply that to the m, you’ll run into an overflow error.

Hugo does a good job of defending the float data type.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31