Press "Enter" to skip to content

Day: March 12, 2021

Data Paging using Common Table Expressions

Steve Stedman takes us through one method of generating pages of data:

I can remember the first time that I worked on data paging code. I had to page through web site search results 20 at a time on a given web page. My task was to understand how it had been written and to do some bug fixing. After reviewing about 2,000 lines of code, and reviewing the seven different variables that were being used to know the current page, the next page, the previous page, the page size, the row at the top of the page, the row at the bottom of the page, and the number of pages, I finally understood what the code was intended to do. What a mess that was, but beyond the mess, the performance was horrible. The way the page worked was that based on the page you were on, all of the rows would be queried, then a loop would read through all of the rows before the current page, then loop through the rows on the current page displaying them on the page, and finally it would ignore the results after the current page. So page 1 was slow, page 2 was slower than page 1, page 3 was slower than page 2 and on and on.

Since that point I have implemented several different data paging algorithms myself, all better than the original implementation but none as elegant as the CTE way of doing data paging. I used to look at data paging as a painful task, but thanks to the SQL Server implementation of CTEs there is no more pain.

Steve also covers OFFSET and FETCH. This technique won’t be great with enormous data sets, but for moderate-sized data sets which query quickly, it works. This is one area which is quite painful, and the best (and wackiest) solution I’ve come up with in SQL Server when the initial query is quite expensive is to create tables with random names to store results and row numbers, populate a table the first time a query is run, and query that table on subsequent runs, using the RETURN value in a stored procedure to pass along the name of the table to access. Granted, that solution works best with static data and you’d want to have a method to clean up those tables after they’re no longer in use (like storing a list of those tables and their last access dates and times). So it’s a mess.

Comments closed

More Number Series Generator Solutions

Itzik Ben-Gan continues a series on generating rows quickly:

This is the third part in a series about solutions to the number series generator challenge. In Part 1 I covered solutions that generate the rows on the fly. In Part 2 I covered solutions that query a physical base table that you prepopulate with rows. This month I’m going to focus on a fascinating technique that can be used to handle our challenge, but that also has interesting applications well beyond it. I’m not aware of an official name for the technique, but it is somewhat similar in concept to horizontal partition elimination, so I’ll refer to it informally as the horizontal unit elimination technique. The technique can have interesting positive performance benefits, but there are also caveats that you need to be aware of, where under certain conditions it can incur a performance penalty.

This is a great post on a rather complex topic.

Comments closed

Fun with Disabling Joins

David Alcock gives us a “wouldn’t it be neat to see this?” scenario:

If you’ve ever seen my session on Guillotines and the Query Optimiser I demonstrated how we can use query hints to switch off certain transformation rules in a query. Transformation rules use a pattern substitution which essentially swaps one expression for another. In SQL Server terms the best example of this is a join as although we use the logical expressions such as left join or inner join the physical join type used in the execution plan operator is substituted to be something like a Hash Match or Nested Loop Join.

This means we can force the behaviour of an execution plan by disabling certain rules. Now I can’t think of any practical reason for this and remember we can also use join hints in our queries but this is different as it affects the entire optimisation process for a query and not just one join so please follow in a sandbox environment.

In case you haven’t seen David’s session, it’s available on YouTube.

Comments closed

Disliking User-Defined Data Types

Andy Levy has a bone to pick:

Here’s the thing – these types are really just aliases for native types in SQL Server, but more constrained. Constrain yourself to UDTs and you’ll have trouble right-sizing your fields. Let’s say you’ve got three data types for text data:

– myShortString (varchar(10))
– myString (varchar(256))
– myBigString (varchar(8000))

These lengths are not helping anyone. You can’t store email addresses or names in myShortString. But myString is probably way too much for that data. You’re going to waste memory because of how SQL Server estimates memory grants and your indexes will be bloated. Maybe you just need to create more UDTs to cover these situations. But that just compounds the other problems, doesn’t it?

Pushes glasses up the bridge of his nose. Teeeeechnically, an e-mail address may be up to 256 characters long, including a username of up to 64 characters (and maybe two angle brackets, depending on the host). So myString would actually be perfect. Steve Jones has a comment about 300, but that was probably the original standard of 320. Regardless, I realize how far beside the point that is, and Andy’s point is a good one, as well as the several others he makes in the post.

One quick note on defined types: they really do make a lot of sense in a domain-driven design, especially when working with functional programming languages. Defining a CustomerID as an int is fine, but if I know my customer IDs are natural numbers (1, 2, 3, …), 9 digits long, and do not contain the sequence 2345 (because my company considers this an unlucky number sequence), creating a CustomerID type which provides this sort of type checking is great because you keep the rules as close to the data as possible and ensure consistency. It’s also more restrictive than int, so you can cast back down to an int when you’re ready to interact with some remote system. So short answer, do this all day in F#, but not in the database.

Comments closed

In Defense of Float

Hugo Kornelis levies a defense of floating point data types:

Let’s return to the database. Let’s figure out a way to store these numbers appropriately.

Could we use decimal (or its synonym numeric)? Well, yes. We can. We need 25 digits after the decimal place for 0.0000000000000000000004052, and 9 digits before the decimal place for 299900000, so that would fit in a decimal(34,25). But if you try to compute c2 so you can then multiply that to the m, you’ll run into an overflow error.

Hugo does a good job of defending the float data type.

Comments closed