Grant Fritchey gets forum-sniped:
Recently I was looking through DBA.StackExchange when I saw a pretty simple question that I decided to answer. I went off, set up a test database and some test tables and quickly wrote a query to answer the question. I got it all formatted pretty and was on my way to post it when I saw that another answer was already there.
Yeah. Identical to mine. Almost line for line.
Well, nuts.
I know. I’ll write a blog post.
In thinking about the problem, the thing that caught my mind was Grant’s comment about poor design. This got me thinking about one of my favorite topics: orthogonal design for relational excellence. The idea of a BETWEEN table of [ MinValue : MaxValue ]
is the first thing people think of but is also the worst because you have two big problems: gaps and overlap.
The second solution is to use MinValue
and calculate MaxValue
(if we actually need it) as LEAD(MinValue) OVER (ORDER BY MinValue) - e
, where e represents the smallest reasonable increment we’d need. Queries would find, for each Value
in the main table, the largest MinValue
below Value
. That removes gaps and overlap but might be a performance concern as the main table’s data size grows.
The big-brain solution, which generally works best when you have a discrete number of options, could be a tally table. In Grant’s example, we see values from 1 to 1000, with a rank for each. If it’s really as simple as that, we would create a new lookup table with Value
+ RankDesc
and simply join the main table’s Value
to the lookup table’s Value
to get the appropriate RankDesc
. Yeah, you have 1000 rows instead of 3 but queries are trivial at that point. The downside is that this approach doesn’t work for continuous variables (e.g., give me the exact amount of your household income for the prior tax year) and the utility of this solution probably breaks down once you get past tens of thousands of rows.
In the case of a continuous variable or an enormous discrete variable, we have the simplest option of all: ignore something. If you care about the range, use the table from the second solution and use that ID on the main table. If you care about the value but not the range, just have the value and no lookup table.
Comments closed